15
Indian Journal of Engin eering & Materia ls Sciences Vol. 7, June 2000, pp. 107-121 Design of s.oft computing models for data mining applications S Surnathi, S N Sivanandarn & Jagad eeswa ri Department of Electrical and Electronics Engineering, PSG Co ll ege of Techonology, Coimbatore 641 004, India Received 18 May 1999; revised received 27 March 2000 Although modern technologies enabie storage of large streams of data, but there is no technology which can he lp to understand. ana ly ze and visuali ze the hidden informa ti on in the data. Data mining also called as data or knowl edge discovery is the process of analyzing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analyti cal tools for analyz in g data. It allows users to analyze da ta from many diff erent dimensions or angles. catego ri ze it , and summarizes the relati onships identified. Pattern classili cati on is one particular category of data mining, which ena bl es the discovery of knowledge from very large databases (VLDB ). Data mining can be applied to a wide range of appli cations such as business forecasting, decision support systems, SONAR, RADAR, SEISMIC and medical diagnosis. Artificial neural networks are used to mine the database which has bett er noise immunity and lesser training time. A self-orga ni z in g neural network architecture called predi c ti ve ART or ARTMAP is introduced that is capable of fast stable learning, hypothesis testing in response to arbitrary strea m of input patterns. A genera li zati on of binary ARTMAP is th e fuzzy ARTMAP, which learns to classify input by a pa tt e rn of fuzzy membership values between 0 and I, indicating the extent to which each feature is present. Generalization of fuzzy ARTMAP is th e Cascade ARTMAP which has pre-exist in g symbolic rules that are used to initia li ze the network before learning so that th e network efficiency is increased. This rule insertion also provides knowledge to the netwo rk that cannot be captured by trai ning examples. Interpreta ti on of knowledge learned by this neural network leads to compact and simpl er rules compared to Back propagati on approach. Another self- organizing algorithm is proposed us in g Koh onen Architecture which also requires lesser time and hi gh prediction accuracy compared to BPN. Moreover, the IlJl es extracted from this netwo rk are very simple compared to BPN approach. Finally, the extracted rules have been validated for their correctness. This approach is most widely used in the Medical Industry for correct predic ti on when the 9atabase is large in size. At this time, the ma nu al mining on such a voluminous data is very difficult and also a very time consuming process. Sometimes it may lead to in correct predictions. Henceforth, th e data mining software is developed. The performance evalu ati on of all three networks namely, Cascade ARTMAP , Fuzzy ARTMAP and Kohonen have been done and compared with conventional method s. Simulation is carried out using the medical data bases taken from the UCI repos it ory of machine learn in g data bases. Th e developed data mining software can also be used for other app li cations like Web, communcati ons, and pattern recognition. With the wide use of advanced data base technology developed during past decade s, it is not difficult to efficiently store huge volume of data in computers and retrieve them whenever needed. Although the stored data are a valuable asset of any application, more people face the problem of data rich but knowledge poor, sooner or late r. This situation aroused the rece nt surge of research interest in the area of data mining. One of the data mining problem is c1as sifica ti on l . 2 . Classification is a process of finding the common properties among diff erent entities and classify them into classes. Th e results are expressed in the form of rule s. Of the two approaches widely used by researchers in the area of artificial intelligence (AI) field like symbolic approach ba sed .on decision trees and connectionist approach ba sed mainly on using ne ur al networks, the later approach has bee n used in thi s paper for pattern classification. The co nnectionist approach is widely used, as it is well suited fo r classification problems. This paper utili zes th e artificial ne ur al networks ( ANN) for classification as it out performs th e symbolic lea rning method. Computer processing and storage technologies have adv anced to a great extent with the ability to keep hundreds of gigabytes or even terabytes of data on line. Even though this dev elopment in tec hnology has provided the facility of having hi storical da ta for decision support application, the analysis methods available are not capable enough to discover knowledge from them. This information discovery through Data mining algorithms like neural networks provides the solution to the above problem 2 . Artificial ne ur al networks are predictive models that are extensively used for pattern classification and decision making. Artificial ne ur al networks provide the rapid learning, perf ect decision making needed for real life application. The network's intelligence is

Design of s.oft computing models for data mining applications

  • Upload
    buidan

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Design of s.oft computing models for data mining applications

Indian Journal of Engineering & Materials Sciences Vol. 7, June 2000, pp. 107-121

Design of s.oft computing models for data mining applications

S Surnathi, S N Sivanandarn & Jagadeeswari

Department of Electrical and Electronics Engineering, PSG College of Techonology, Coimbatore 641 004, India

Received 18 May 1999; revised received 27 March 2000

Although modern technologies enabie storage of large streams of data, but there is no technology which can help to understand . analyze and visuali ze the hidden information in the data. Data mining also called as data or knowledge discovery is the process of analyzing data from different perspectives and summarizing it into useful information . Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles. categori ze it , and summarizes the relationships identified. Pattern classilication is one parti cular category of data mining, which enables the discovery of knowledge from very large databases (VLDB ). Data mining can be applied to a wide range of applications such as business forecasting, decision support systems, SONAR, RADAR, SEISMIC and medical diagnosis.

Artificial neural networks are used to mine the database which has better noise immunity and lesser training time. A self-organi zing neural network architecture called predi cti ve ART or ARTMAP is introduced that is capable of fast stable learning, hypothesis testing in response to arbi trary stream of input patterns. A generali zation of binary ARTMAP is the fuzzy ARTMAP, which learns to classify input by a pattern of fuzzy membership values between 0 and I , indicating the extent to which each feature is present. Generalization of fuzzy ARTMAP is the Cascade ARTMAP which has pre-exist ing symbolic rules that are used to initiali ze the network before learning so that the network effici ency is increased. Thi s rule insertion also provides knowledge to the network that cannot be captured by trai ning examples. Interpretati on of knowledge learned by thi s neural network leads to compact and simpler rules compared to Back propagation approach. Another self­organizing algorithm is proposed using Kohonen Architecture which also requires lesser time and high prediction accuracy compared to BPN. Moreover, the IlJles extracted from this network are very simple compared to BPN approach. Finally, the extracted rul es have been validated for their correctness. Thi s approach is most widely used in the Medical Industry for correct predicti on when the 9atabase is large in size. At this time, the manual mining on such a voluminous data is very difficult and also a very time consuming process. Sometimes it may lead to incorrect predictions . Henceforth, the data mining software is developed. The performance evalu ati on of all three networks namely, Cascade ARTMAP, Fuzzy ARTMAP and Kohonen have been done and compared with conventional methods. Simulation is carried out using the medical data bases taken from the UCI repository of machine learning data bases. The developed data mining software can also be used for other applications like Web, communcations, and pattern recognition .

With the wide use of advanced data base technology developed during past decades, it is not difficult to efficiently store huge volume of data in computers and retrieve them whenever needed. Although the stored data are a valuable asset of any application , more people face the problem of data rich but knowledge poor, sooner or later. This situation aroused the recent surge of research interest in the area of data mining. One o f the data mining problem is c1assificati on l

.2

. Class ification is a process of finding the common properties among different entities and class ify them into classes . The results are of~en expressed in the form of rules.

Of the two approaches widely used by researchers in the area of artificial intelligence (AI) field like symbolic approach based .on decision trees and connectionist approach based mainly on using neural networks, the later approach has been used in this paper for pattern classification . The connectioni st

approach is widely used, as it is well suited fo r classification problems. Thi s paper utili zes the artifici al neural networks (ANN) for classification as it out performs the sy mbolic learning method. Computer process ing and storage technologies have advanced to a great extent with the ab ility to keep hundreds of gigabytes or even terabytes of data on line. Even though this development in technology has provided the fac ility of having historical data for decision support application , the analysis methods available are not capable enough to discover knowledge from them. Thi s information di scovery through Data mining algorithms like neural networks provides the solution to the above problem2

.

Artificial neural networks are predictive models that are extensively used for pattern classification and decis ion making. Artificial neural networks provide the rapid learning, perfect decision making needed for real life application. The network's intelligence is

Page 2: Design of s.oft computing models for data mining applications

108 INDIAN J. ENG. MATER. SCI., JUNE 2000

obtained through encoding of domain knowledge, which enables the network to have fast learning and better classification capability.

Ever since the advent of internet and its wide spread acceptance, data bases have been assuming gargantuan proportions. Though these databases are storehouses of knowledge, effective tools for taping these resources are the need of the hour. This serves as the catalyst to spur research in data mining. Pattern classification and decision. making based on the available data is the key area of interest among the research communities.

The emerging field of knowledge discovery in databases (KDD) has grown significantly in the past few years. In recent years, the technology for computing and storage of data has enabled people to collect and store information to a larger extent. Modern database technology has storage of large streams of data which has to be analyzed and visualized, which can be done by using Artificial neural network . An artificial neural network approach is used as a data mining tool for analyzing the data. The basic approach used is the back propagation network (BPN) which has the following limitations'.

(i) Neural Network symbolic rules quickly lose their original meanings .

(ii) It is neither able to create nodes during learning nor able to reduce nodes during pruning.

(iii) Learning time is large.

For the same reason, a new neural network architecture called ARTMAP that overcomes the above mentioned forms an efficient network for classification. The use of neural networks 111

classification gives a lower classification error rate and is robust to noise3.4.

This paper concentrates on reducing the learning time by using self-organizing neural network that increases the classification accuracy. With the rule extraction procedure, the symbolic rules ex tracted from the network are much simpler and more easier to understand. Moreover, rules can be extracted at any stage of the learning process which is impossible in the case of BPN.

A supervised learning system is built up in ARTMAP from a pair of Adaptive Resonance Theory modules ( ARTa and ARTb) that are capable of Self­Organizing stable recognition categories in response to arbitrary sequences of input patterns . During training trial, the ARTa module receives a stream of input patterns a[p] and ARTb receives a stream of input patterns b[p] , where b[p] is the correct

prediction given a[p]. These ART modules are linked together by an internal controller that ensures autonomous system operation in real lime. During test trials, the remaining pattern a[p] are presented with out b[p] and their prediction at ARTb are compared with b[p] . The internal controller, increases the vigilance parameter of ARTa increases by a minimal amount needed to correct a predictive error at ARTb. The generalization of binary ARTMAP sys tem called as Fuzzy ARTMAP system that learns to classify inputs by a pattern of fuzzy membership values between 0 and I indicating the extent to which each feature is present. This generalization is achieved by replacing the ART module in binary ARTMAP system with fuzzy ART modules . Parameter p" calibrates the minimum confidence that ARTa must have in a recognition category, or hypothesis , activated by an input a[p] in order for ARTa to accept that category, rather than search fo r a better one through an automatically controlled process of hypothesis testing. Hypothesis testing leads to the creation of new ARTa category, which focuses attention as a new cluster of a[p] input features that is better able to predict b[p]. Another algorithm and architecture suggested for the Artificial r.eural network is the Self- Organizing Maps that has the special property of effectively creating spatiall y organized "internal representation" of variou s features of input signals and their abstractions. After training the network, rule extraction process is carried out , and finally the extracted rules are checked for its validity and its correctness.

Cascade ARTMAP Architecture3

A new hybrid system termed cascade adaptive resonance theory mapping (ARTMAP) which incorporates symbolic knowledge into the neural network . This is done by a rule inse rtion algorithm that transl ates if then symbolic ru les into cascade ARTMAP architecture. This inserted symbolic knowledge is refined and enhanced by the cascade ARTMAP learning algorithm. During learning, the symbolic rule forms are preserved thereby rules extracted from cascade ARTMAP can be compared with the originally inserted rules. The rule insertion, refinement and rule extraction from a paradigm for symbo lic knowledge refinement and evaluation is shown in the Fig.l .

During learning, new recognition categories (rules) can be created dynamically to cover the deficiency of the domain theory. Also, each extracted ru le is

Page 3: Design of s.oft computing models for data mining applications

SUMATHI et aL.: DATA MINING APPLICATIONS 109

Set initial Train the Refined

Rule Base Rule Cascade Network Rule .. Refined .. ARTMAP

,.. (by Insertion r Extraction Rules

Network pruning)

Fig. I -Cascade ARTMAP overall processing, rule insertion , refinement, extraction

associated with a confidence factor that indicates its i~portance or usefulness. This allows ranking and evaluation of the extracted knowledge.

The cascade ARTMAP can be explained by five stages. Stage I: Rule Cascade Representation. Stage 2: Rule Insertion. Stage 3: Rule Chaining and Inferencing. Stage 4: Rule Refinement and Learning. Stage 5: Rule Extraction.

Cascade ARTMAP Algorithm

The different steps in Cascade ARTMAP algorithm is li sted below. A and B denote FI"and Fib input vectors respectively.

a { n a a ae <Ie ne } d x = Xi ,Xh ,Xo ,Xi ,Xh ,Xo an b { b b b be be be} X = Xi ,Xh ,Xo ,Xi ,Xh ,Xo

denote 2M dimensional vectors of FI" and Fib acti vity vectors respecti vel y. . Xia and Xib denote Mi dimensional input attribute vectors. Xha and Xhb denote Mh dimensional intermediate attribute vectors . xo· and xob denote Mo dimensional output attribute vectors.

ae ae lie he be be hI ' b Xi ,Xh ,Xo ,Xi ,Xh ,Xu are t e comp ement attn ute vectors. Ya ,Yb denote F2" and F2b activity vectors. x·b denote Map Field Fa" activity vector.

Initially all the weights of jth category node in F2 a d F b . a b d ab d h . h an 2 I.e., Wj , Wj an Wj enole t e welg t

vectors fromjth F2a node to Fab,contain all I"s, i.e., all

the nodes are uncommitted.

Step I: (Input Presentation)

Input is presented to ARTa with initi al values of choice parameter OC a> ° and OCb> ° Learning rates ~. E [0, I] and ~b E [0, I]

Vigilance parameters p" E [0, I] and Pb E [0, I]

During learning Pa = Pb = I; x"=A

Step2: (Rule Selection) Given xa ,the choice function T/ for each F2' node j

is calculated as I x"Aw " I

T " = J

J I " I OC A + Wj

where fuzzy AND operator A IS defined by (pAg)i = min(Pi ,gi) The category choice indexed at J where T/, = max { Tj

a : for all F2" node j} If more than one T/, is maximum then F2" category with smallest index is selected . When Jth category is chosen, y/= I and y/, =O:j:;t:J .

Then resonance occurs if the match function m/ of node J meets the vigilance criterion

I xaAw "As I rn a _ J J > P

j - I x " As J I a

where s) is the scope vector

s ji = {I if I indexes an input attribute

° otherwise

else if m/ < Pa then mismatch reset occurs and T/ is set to zero and step 2 is repeated for another index J.

Step 3: (No prediction) If the selected F2" node J has no prediction, i.e. ,

W) kab = I for all Fab node k.

each node j in the precursor set \jf(J) learns W/,(new) = ( I-~a) WtOld) + ~ ,,(xa A Wj·(Old» If the input vector B is presented the category node selected as in step 2 and F2b node k learns as WKb(new ) =(I_~b) WKb(oltl) + ~b(Xb A WKb(Old»

if J is associated with K then

WJ': = I {if k = K; ° otherwise and the system halts.

Page 4: Design of s.oft computing models for data mining applications

110 INDI AN J. ENG. MATER. SCI., JUNE 2000

Step 4: (Irife rencing) If F2 a node J has learned to make a prediction then

WJ kab acti vates Fab

ab W ab X = J

Once the map field is ac ti ve F2b is ac ti vated through the one to one pathway between Fab and Ft . T b ab I. e. , k = X k

Then category choice K is made as TKb=max{ Tkb: for all F2b node k}

if Kth category is chosen

YKb= I and Ykb =O:k:;tKthen.

xb = W K b (Top-down priming occurs) Termination is checked by calculating a goal signal g.

M"

g = I (x ~i + X ~ ) j; l

A conclusion is reached if g > 0 (output attribute IS

known)

Step 5: (Update Memory State) If g = 0 a conclusion is not reached and x" IS

updated as xa(new) = X,,(o ld) V X b(o ld )

Where fuzzy OR v is defin ed as

(pvq); = max(p; ,q;)

then tep 2 is repeated.

Step 6: (Prediction Matching) If a conclusion is reached (i .e .) g > 0 match

function mKb of the predicti on xb is calculated as IB A Xh I

m h =----K

IBI

Step 7: (Resonance) Resonance occurs if mK b ~ P b. Then F2" and F2 b

learns their weights and the system halts.

Step 8: (Match Tracking) If mK b < P b then predicti on mismatc h occurs. Then

ART" vigilance P" is rai sed sli ghtl y greater as p }new) = max { p ,, (Old), min{ m/,/j E \jf(J)} + £} then the process is repeated from step2.

ARTMAP thus handles a class of if-then rules (symbolic rule-based knowledge) which often involves rule cascades and intermediate attributes. A set of rules is said to fo rm a rule cascade when a consequent of a rule also serves as antecedent of another rule. Those attributes that play dual roles are called intermediate attributes. Input attributes serve as antecedents and output attributes serve as consequents. The rule cascade representation is shown in the Fig. 2.

From the Fig 2, the rule cascade representati on IS

represented in the rul e form as Rule I . If A and B THEN C Rule 2. If C and D THEN E.

IL ___ M_ A_P_F_J_E_L_D ___ ~ ARTa

o

ABC D E ABC D E

Fig. 2 - Rule cascade representation

Page 5: Design of s.oft computing models for data mining applications

SUMATHI el al.: DATA MINING APPLICATIONS III

A, B are input attributes in Rule I and C is the output attribute. In Rule 2 'C and 0 are the input attributes and E is the output attribute. Hence, C is called as an intermediate attribute. The rule inserti on process proceeds in two phases. The first phase parses all rules for attribute names to setup a symbol table in which each attribute has unique entry. The second phase translates each rule into 2M-dimensional vectors A and B, where M is the total number of attributes in the symbol table. The process of rule insertion into the network gives additional knowledge that cannot be gained during the training process. These pre-exi sting symbolic rules can be used to initialize the network before learning. Complement coding is used only if the table contains negati ve attributes. After rule insertion the network is trained with examples from the input data set If training is sufficient then the network is tested for a specified test ratio. Then the network is refined by pruning process which removes unused nodes and rules are extracted from the rdined network. The overall processing in Cascade ARTMAP system is shown in the FigJ.

Rule cascade representation A set of rules based on the data base are formed in

the form that a consequent of a rule also serves as the antecedent for another rule - rule cascade. These attributes that play dual role both as input and as output attributes are called as intermedi ate attributes. Both the input , intermediate and output attributes are

presented to both the ART networks. For example, let a database contain 4 input

attribute and I output attribute. The rule form representation is given below.

Rule I : If ( attribute I ) and (attribute 2) then attribute 3.

Rule 2: If(attribute 3) and (attribute 4) then attribute 5.

Where attribute I , 2 and 4 are input attributes, attribute 3 is an intermedi ate attribute and attribute 5 is the actual output attribute.

Rule insertion The above formed rules are inserted into the

cascade ARTMAP network. If- then rules are translated into the recognition categories of the system. Each attribute is coded by using thermometer coding principle l

. Each rule is translated into 2M dimensional vectors where 'M' is the total number of attributes, as the inputs to ARTa and ARTb modules.

The algorithm deri ves a pair of vectors,. attribute I = (a,a"), attribute 2 = (b,bC

) where aC

, bC are the complement vectors corresponding to the respective attributes. If the inserted rules contain no negati ve attributes, then complemented vectors may be eliminated. These vector pairs are used as training patterns to initiali ze the network.

Rule chaining and inferencing

In cascade ARTMAP, the attribute fi elds F la of

Fl' Update Match D D I tiYiriY1I'trrrn I ¢::=JI ~ I

Fig. 3 - Cascade ARTMAP overall processing

Page 6: Design of s.oft computing models for data mining applications

11-2 INDIAN J. ENG. MATER. SCI. , JUNE 2000

ARTa and A RTb is identified as working memory. F2a

maintains current memory' state xa and gives antecedents for matching and rule firing. FJ b stores the next me mory state xb des ired through rule firing . I . M atch Phase:

A choice function T/ for each F2" node (rule) IS

ca lculated based on the memory state vector x" . 2. Se lect Phase:

This is reali zed by selecting the winner of all F2"

nodes [i .e., F2" node with largest choice function] . 3. Execute Phase:

The results of the selec ted rule are read out into F Jb. At the end of the cycle, a new memory state xb is used to update x" in F J" to prepare fo r the nex t inferenc ing cycle as in Fig. 3.

Learning and rule refinement In cascade ARTMAP a chain of rule firing is

invo lved fo r making a prediction . If predi ction is made by the F2" node is correct, the weight vector W/ is reduced towards its fu zzy intersection with the F J" acti vity vector x" . As the input is presented in binary pattern , a fi xed rule ignore& those features that are absent in the current input. Thi s results in a generali zation by reduc ing the number of features.

W hen predicti on error occurs, a mini-match

track ing process ra ises ART" vigil ance POl by slightl y more than the minimum match achieved by the fired rules . The rule with the worst match is most like ly to be the one, whic h causes predic tion error.

Rule extraction processS,6

Rule ex tracti on from the data base forms a part of knowl edge di scovery research. Neural Network is first trained with the coded attributes fro m the data base. Thi s tra ined network is then used to identi fy those attributes that are irre levant for the class ification decis ion .

Rule extracti on ex trac ts rules fro m the training database. Some inputs also get de leted due to network pr!1ning. Thi s may result in duplicati on of some rows in the database. These rows are removed be fore rules are ex tracted , as they do not contain any new information about the output class. Rule pruning procedure selects small sets of rules fo r cascade ARTMAP. Rul e ex tracti on proceeds in two stages:

Rule pruning Pruning removes those recogl1ltlon nodes whose

confidence factor fall s below a selected threshold . The threshold can be set according to the database accuracy obtained. The general value of thresho ld set is 0.5.

The rule pruning a lgorithm deri ves a confidence factor for each F2" category node in terms of its usage i.e., frequency in a training set and its predicti ve accuracy on a predicting set. For calculating usage and accuracy each F2" category node j maintain three counters: (i) an encoding counter Cj ,that records the number of training patterns encoded by node j; (ii) a predicting counter Pj that records the number of predicting set patterns predic ted by node j and ; (iii ) a success counter Sj ,that record the number of predicting set patterns predicted correc tly by node j .

For each training pattern , the encoding counter (Cj )

of each F2" node j in the precursor set \jf(J), where J is the last F2" node (rule) fired that makes the predicti on, is incremented by one. For each predicting set pattern , the predicting counter (P j ) of each F2 a node j in the

precursor set \jf(J), is incremented by one . If this predicti on is correct, the success counter (Sj) of each

F2" node j in the precursor set \jf(J), is incremented by one. Based on the three counters name ly encoding, predic ting and success counter va lues, the usage (Uj ) ,

accuracy (Aj) of an F2" node j are computed by Usage (Uj) equals number of training set patterns

coded by node j (C) di vided by a normali zing fac tor. Uj = C/max {C j : node ] predic ts outcome K} Accuracy (Aj) equals the percent of test set

patterns predic ted correc tl y by node j (Pj) divided by a normali zing factor.

Aj = P/max {Pj : node ] predicts outcome K }

Where Pj equals the per cent of predicting set patterns predic ted correctly by node j and is computed by

Pj =S/Pj

Uj and Aj are then used to compute the confidence fac tor for each node j by using the equation

CFj = yUj + ( l-y)Aj

where CFj is the confidence fac tor for node j ,y is the we ighting factor E [0, I] which is non a lly set to 0.5.

After confidence factors fo r each node is determined, recogniti on categories can be pruned from the network using the fo ll owing strategies .

Threshold pruning Thi s is the simplest type of pruni ng where the F2'

nodes with confidence factors below a g iven thresho ld 't are removed from the network. T he value of 't is normally se lected as 0.5. T hi s method of pruni ng is the fas test method and it prov ides f irst cut e liminati on of unwanted nodes . To avo id over pruning, it is

Page 7: Design of s.oft computing models for data mining applications

SUMATHI el al.: DATA MINING APPLICATIONS 113

sometimes useful to specify the minimum number of recognition categories to be presented in the system.

Local pruning 'Local pruning removes recognition categories one

at a time from an ARTMAP network. The base line system performance on the training and the predicting sets is first determined . Then the algorithm deletes the recognition categories with the lowest confidence factor i.e., the hybrid system is pruned using threshold pruning first and then applies local pruning on the remaining smaller set of rules. The category is replaced, however, if its removal degrades system performance on the training and predicting sets.

Simulation Results

Simulation was carried out on set of databases7

, i.e ., Echocardiogram data base.

Echo cardiogram database

medical

The data base gives a data set of 113 persons who have suffered from heart attack at some point in the past. Some are still alive and some are not. The two input attributes namely the "survival months" and "sti ll alive" when taken together indicate whether a patient survived for at least one year following the he'art attack. This data set is used to predict whether (ALIVE) or not (DEAD) patient will survive at least one year after the heart attack.

The data base contains 9 input attributes and 2 output attributes. The initial 8 rule base knowledge distinguishes among the person alive or dead. Input attributes: (9) Survival months, Still alive, Age, Pericardial fluid , Fractional shortening, Epss, Lvdd , Wall-motion score, Wall-motion-index. Output attributes:(2) Alive, Dead. The initial rule base set is shown in Table I .

The first stage of operation is the rule insertion stage. This stage sets up a symbolic table which has a unique entry for each attribute. Initially, the rule base set is inserted into the network by coding the attributes as shown in Table 2.

Thermometer type of coding method is used for input presentation 1.8. After inserting the rules , the network is trained using different train to test ratio. The simulation results showed that , with all rules inserted or lesser rules inserted the network gains 100% test set prediction for maximum training set for One iteration as indicated in Tables 3a and 3b.

From the simulation results shown in the Table 3b , it is seen that with training ~amples lesser than 50% the network gains more than 80% test set prediction.

With little more increase in the training samples, the network reaches 100% test set prediction .

The simulated results for Echo cardiogram samples for two different rule insertion is plotted in the graph and are shown in Graph I and Graph 2.

With Pa set to 0.5 . limited number of nodes are created and the network gains 100% accuracy . By selecting Pa less, many output classes combined into a single class due to lesser number of nodes. As result the amount of test set prediction is less. If Pa is increased identical output attributes selects different nodes thereby the accuracy of the test set prediction increases. The simulation is carried out with 50 training samples as shown in Table 4. When the value

Rl

R3

RS

R7

Table I - Sample of rule base inserted into the network for echocardiogram database

IfO::;Sm<22 and Still alive is I and age<40 and Fluid is 0 and 0::;Fs<0.23 and II ::;Epss<31 and 3::;Lvdd<S and 8::;Wms < 15 and I:<;;Wmi<2 then alive IfO:<;;Sm<22 and Still alive is I and 40:<;;age<60 and Fluid is I and 0:<;;Fs<0.23 and I I :<;;Epss<3 I and S:<;;Lvdd:<;;7 and IS:<;;Wms :<;;40 and 2:<;;Wmi:<;;3 then alive If 22:<;;S m<S8 and Still alive is 0 and age<40 and Fluid is 0 and 0.23:<;;Fs:<;;0.6 and O:<;;Epss< I I and 3:<;;Lvdd<S and 8::;W ITIS < 15 and I:<;;Wmi<2 then dead If 22:<;;Sm<S8 and Still alive is 0 and 60:<;;age:<;;90 and Fluid is 0 and 0.23:<;;Fs:<;;0.6 and O:<;;Epss< I I and 3:<;;Lvdd<S and 8:<;;Wms < 15 and I:<;;Wmi<2 then dead

R2

R4

R6

R8

IfO:<;;Sm<22 and Still ali ve is 0 and age<40 and Fluid is 0 and 0.23 :<;;Fs:<;;0.6 and O:<;;Epss< I I and 3:<;;Lvdd<S and 8:<;;W ms <15 and I:<;;Wmi<2 then dead Ir 0:<;;Sm<22 and Still ali ve is I and 40:<;;age<60 and Fluid is 0 and 0:<;;Fs<0.23 and I I :<;;Epss<3 1 and 3:<;;Lvdd<S and 8:<;;Wms <15 and I:<;;Wmi <2 then ali ve If 22:<;;Sm<S8 and Still alive is I and age<40 and Fluid is 0 and 0.23:<;;Fs:<;;0.6 and O:<;;Epss< I I and S:<;;Lvdd:<;;7 and 15:<;;Wms :<;;40 and 2:<;;Wmi:<;;3 then dead IfO:<;;Sm<22 and Still ali ve is 0 and 40:<;;age<60 and Fluid is I and 0.23:<;;Fs:<;;O.6 and O:<;;Epss< I I and S:<;;Lvdd:<;;7 and 15:<;; Wms :<;;40 and 2::;Wmi:<;;3 then dead

Page 8: Design of s.oft computing models for data mining applications

11 4 IND IAN J. ENG. MATER. SCI. , JUNE 2000

of Pa is less then the number Df nodes are less thereby the number of output classes combi ne together reducing the test set accuracy . But with high value of Pa, the number of nodes are increased then, same classes are assoc iated with different nodes thereby increas ing the effic iency of test set prediction. T he plot of accuracy/nodes versus predicti on is plotted and is shown in graph 3 and graph 4 .

After tes[ing, pruning is done to reduce the unused nodes in the F2" layer which does not affec t the c lass ification of the accuracy. Pruning thus gi ves limited number of nodes, which removes unwanted nodes from the network there by reduc ing the complex network into s imple network. The ru les extracted out of the pruned network are li sted in the Table 5.

From the Table 5, it is seen that Ru le I is generali zation over the missing rule R6 as the

Table 2' - The codi ng or input atlr ibutes for echocardiogram data samp les

S.No. Attribute name Numl;ler of bits Interval

I. 2. 3. 4. 5. 6. 7. 8. 9.

Survival mont hs 2 (0-22) , (22-57) Still ali ve 2 0 , I Age 3 <40,40-60, >60 Peri card ial tl uid 2 0 , I Fractional sho rtening 2 (0-0.23) , (0.23-0.6) EPSS 2 (0- 11), > I I LVDD 2 (3-5), (5-7) Wall -motion score 2 (8- 15) . ( 15-40) Wall -moti on index 2 ( 1-2), (2-3)

~;IOO ._-

~",\ 80 - - - - - - - - -; - - .~-=--=-=~+--~~-:-:-:-:-=-~~~~-------'./' I ,

§ 60 - - - - -/~< - --------~ ---------: ---------:---------t1j /r I I I I

: 40 V:, -----~ - ----. ---~ -.. -..... ~ --.... ---~- --. -.. -. ~ 20~----~----~~----~----~------~

o 20 40 60 BO 100 Training samples

Graph I - Training samples versus tes t set accuracy fo r echocardiogram data with 8 rules inserted

;; 100 ///

~ m . -- -. -.. -:-.--- .. -- ; - . -. . - --.~.- - .---~,::< - . -§ _-----r- I I

t'l ~-----: :

~ 60 ~=---~ - --:- - - - - - - --: - - -- -- --o r --- --- ---:- --- --- .-, ,

'" , '

~ 400L-----J20------~40~--~ffi=· ----~8~O----~100· 1r31111119 samples

Graph 2 - Training samples versus tes t set accuracy fo r echocardiogram data with 4 rules inserted .

Table 3a - Simulated results fo r echocardiogram date set with 8

rules inserted p" = 0.5

S.No. Trainrrest set % Test set Nodes/Rules Training pred iction Iterat ions

I. 0111 3 30.9725 5/2

2. 25/88 76. 1364 6/3

3. 50/63 80.9524 8/4

4. 75/38 84.2 105 8/4

5. 10011 3 100 10/5

Table 3b - Simulated results for echocardiogram date set wit h 4

rules inserted p" =0.5

S.No. Trainrrest set % Test set Nodes/Rules T raining

I.

2.

3.

4.

5.

pred icti on Iterati ons

0111 3 56.6372 5/2

25/88 68.2540 5/3

50/63 76. 1364 7/3

75/38 76.3 158 9/3

100113 100 11/4

Table 4 - Variation ortest set pred iction and nodt:s with vigi lance parameter for echo samples

S. No. Training samples/p" Test set prediction Nodes created

I.

2.

SO / 0 .5

50 / 0.7

(%)

79.365 I

93.6308

~95~----'-----'-----'-----J ~ ~~

I , t~

~9J ------- ----; --_ . . -. . ---; -. ---- -...:--=.;~r-- ------- -~ : .-:.-----..--: § 85 -------- --- ~ -:=-:-j---~.- -~- --------. -~ -. -.-------~ ~ : : ~ 80 ~ ----. -- ~ ----. ---- -- ~ ---------- - ~ .. -... --- ---

, ,

8

13

(jj

~75 L-----~~----~------~------~ 0.5 0.55 0.6 0 65 O)

\igilance of.NH a

Graph 3 - Variation of accuracy wi th increase in vigilance parameter for echocard iogram dataset

14 ..-------.---------r------,------i

~ I ' --~ ! 12 · -· · · ··· ·-·!··· ·- -·--· -C~-~~-·-::· · · - · · Q) ... ..----') __ _ __ _ t · _____ _ __ _ _ E 10 . . - . -.. . . . -; -.:.:-::.:.-'-_ . - .• -.-.- ,

i _..---.-------; 8 0.5 0.55 0.6

Vigilance of ART a 0.65 O}

Graph 4 - Creation of nodes in F2" layer on increase in vigi lance parameter fo r echocard iogram data samples.

Page 9: Design of s.oft computing models for data mining applications

SUMATHI e/ at. : DATA MINING APPLICATIONS 115

attributes in the rule are sufficient to identify the output class. From the simulation carried out using cascade ARTMAP architecture, the network training time is less as it gain 100% prediction with one iteration itsel f. Accuracy also reaches 100 per cent irrespective of any other neural network architecture.

The simulat ion results in cascade ARTMAP architecture with rule insertion and input data's are converted to fuzzy membership values ranging between 0 and I for the heart disease training samples is shown in Table 6.

According to row I in Table 6 the interpretati on of rules is shown below.

If age is very low to very high and sex is very low to very high and CP is very low and TBPS is very low to very hi gh and Chol is very low to medium and BS is very low and Recg is very low to very hi gh and Thalac is very low medium and exang is very low and Old pk is very low to high

Table 5 - Extracted rules from Echo cardiogram data set

Rule 1 1f0<SM<22 Rule 2 If O<SM<22 and If Still Ali ve is 0 and If Still Alive is I and If AGE< 90 and If 40 <= AGE <90 and If Fluid is 0 and If Fluid is 0 and If 0 <= and If 0 <= Short <0.23 Short <O.23 and If3< LVDD <7 and If 3 < LVDD < 7 and If 8 <= WS <40 and 11' 8 <= WS <40 and If I <WI <3 and If I <WI <3 and then Dead and then Ali ve

Rule 3 If 0< SM <22 Rule 4 If 22< =SM <57 and If Still Alive is I and If Still Alive is 0 and IfAGE<90 and If40 <= AGE <90 and If Fluid is I and If Fluid is 0 and If 0 <= and If 0.23 <= Short <0.23 Short<0.6 and If3< LVDD <7 and If 0< = EPSS < II and If 8 <= WS <40 and If3<LVDD<7 and If I < WI < 3 andlf8 <=WS<40 and then Alive and I f I < WI < 3

and then Dead

and slope is very low to hi gh and Ca is very low to very high and Thai is very low then heart may be healthy

Fuzzy ARTMAP Architecture9•10

Fuzzy ARTMAP, a generalization of binary ARTMAP, learns to classify inputs by a pattern of Fuzzy membership values between zero and one, indicating the extent to which each feature is present. The Fuzzy ARTMAP system incorporates two fuzzy ART modules ART" and ARTb that are linked together via an inter-ART module, Fab called as the map field . The map field forms predictive associations between ART" and ARTb categories. In classification tasks, each node in the ART" field F2

a

codes a recognition category of ART" input pattern. During training each such node learns to predict an ARTb category as shown in Fig.4. The interact ions mediated by the map field , F"b is characterized as follows.

Inputs to ART" and ARTb are in the complement coded form. For ART" , I=A=(a,aC) and for ARTb , I=B=(b,bC). This process of complement coding is called as normalization . Normalization of fuzzy ART inputs presents category proliferation, i.e., it uses on cells and off-cells to represent the input pattern and preserves individual feature amplitudes, while normalizing the total on-ceilloff-cell vector. The complement coded Fo to FI input r is the 2M­dimensional vector.

I=A=(a,aC) = (a"a2, .... aM, al c •... aMC) where at'= I -a;

A complement co.ded input IS automatically normalized because III = l(a,aC)1 = I i=,Ma; + (M-I;=, M a;) = M.

For ART" ,X" denotes the FI a output vector , y a denotes the F2 a output vector, and w/, denotes the jth ART" weight vector. For ARTb ,Xb denotes the Fib output vector , yb denotes the F2b output vector, and W k b denotes the kth ART h weight vector, For the map field , Xab denotes the F"b output vector and wah denotes the weight vector from the jth F/ node to Fa6.

Table 6 - Sample set of 5 rul es extracted from heart disease samples

[Interpretation of quantized weight values 1- very low 2-low. 3- medium. 4-h igh and 5-very hi gh]

Pred. Age Sex CP TBPS Chol BS Recg Thalac Exang Old pk Slope Ca Thai

+ 1-5 1-5 I- I 1-5 1-3 I-I 1-5 1-3 I-I 1-4 1-4 1-5 I-I + 1-4 I-I I-I 1-4 1-4 1-5 I- I 1-3 1-5 1-5 1-4 1-5 I-I

1-4 I-I 1-5 1-5 I-I 1-5 I- I 1-4 I-I 1-5 1-4 1-5 I-I I-I I-I I- I 1-5 1-5 1-5 1-5 1-4 1-4 1-5 1-4 I- I I- l 1-4 I- I I- I 1-4 1-4 1-5 I- I 1-3 1-5 1-5 1-4 1-5 1-

Page 10: Design of s.oft computing models for data mining applications

116 INDIAN J. ENG. MATER. SCI. , JUNE 2000

.b Wjk

ART,

Fi g. 4 - Fuzzy ARTMAP processing

ART Field Activity Vector Each system includes a field Fo of nodes that

represent a current input vector, a field FI that receives both bottom- up input from Fo and top down input from a field F2 that represents the active code or category. Vector I denotes Fo activity, Vector x denotes FI activity, Vector y denotes F2 activity. Associated with each F2 category node j is a vector Wj of adaptive weights, or long term memory (LTM) traces.

Initially. Wjl(O) = ..... WjM(O) = I , which means that each category is uncommitted. After a category codes its first input, it becomes committed. A choice parameter a> 0, a learning rate parameter P E [0, I] and vigilance parameter pE [0, I]. For each input I and F2 node j , the choice function Tj is defined by

IlAw I T[I] = I

J DC +1 w i I

Where the fuzzy intersection A is defined by (pAq)i = min(pi ,qi)and where the norm 1.1 is defined by Ipi =Li=I

M Ipil

The system makes a category choice when at most one F2 node can become active at a given time. The index J denotes the chosen category, where The category choice indexed at J where Tj = max I Tj : for all F2 node j ) If more than one Tj is maximum then F2 category with smallest index is selected. When Jth category is chosen, YJ = I and Yj = 0 :j :;t:J.

Then resonance occurs if the match function mJ of node J meets the vigilance criterion

IlAw j I m = > p

J I I I -

If the above equation is not satisfied then mismatch reset occurs where the value of the choice function Tj is set to 0 for the duration of the input presentation. The search process continues untill a chosen category J satisfies the matching criterion . Once search ends, the weight vector WJ learns according to the equation

W/new) = (I-P) W/old

) + P(lA W/Old»

Map field activation Map field is activated whenever one of the ART" or

ARTb categories is active. The F"b output vector xab

obeys the following expression.

lY"AW;h if theJth F;' nodeis ac tive and F; is active

X "h = wt iftheJthF;'nodeisactiveandF;is in acti ve

y" if the F;' node is inactive and F; is act ive

o irthe F;' nodeisinacti veandF~' i s i nacti ve.

Thus, if the prediction made by W/,b is disconfirmed by l then xab=O. Thus this mi smatch triggers an ARTa search for a better category, by a Match tracking process li sted below.

Match tracking

At the beginning of input presentation Pa is set to baseline vigilance p".

I X "h I I X "h I P - if P <--

"h -~ "to Iy h I Map fi eld vigilance

I Ai\.w ; I then p" is increased slightly greater than

IAI

where A is the input to FI" in the complement coded

form (a,ac). This search leads to the activation of

I Ai\. w; I I y" A W ~'h I another F2" node J with I A I ~ p" and I y" I

~ Pab

If no such node exits then F~' is shutdown for the

rest of the input patterns .

Map field learning

I . . II ab· F" F"b . f ab(O) I mtla y, Wjk tn 2 to satls y Wjk =; During resonance with ARTa category J active w/,b approaches the map fi eld vector X"b. With fast learning, once J learns to predict the ARTb category K, that association is permanent, i.e. , wJ/b = I for all time.

Rule Extraction ProcedureS,6

Rule extraction in fuzzy ARTMAP proceeds in two phases:

Page 11: Design of s.oft computing models for data mining applications

SUMATHI el al.: DATA MINING APPLICATIONS 117

Quantization of weight values Quantization is the method used to describe the

rules in words rather than real numbers, for which the feature values represented by weights Wi/ were quantized .

A quantization level Q is defined as the number of feature values used in the extracted fuzzy rules. For example with Q = 3 , the feature values are described as low, medium, high in the fuzzy rules . There are two methods of quantization .

(i) Quantization by Truncation Divide the range of [0, I] into Q intervals. Assign a

quantization point to the low6r bound of each interval. i.e. for q = I ...... . Q ,where 'q' is the quantization point. Let Vq=(q-I)/Q.

When a weight w falls in interval q, reduce the value of w to Vq as shown in the Fig. Sa.

(ii) Quantization by Round-off In this method of quantization, the Q quantization

points are evenly distributed in the range of [0, I ], with one at each end point, i.e., for q = I ... Q

Vq = (q-I )/(Q-I) .

When a weight w falls in interval q , the weight value 'w ' is rounded off to the nearest value V'I as shown in the Fig. Sb.

According to the simu lation done earlier, the results obtained by using the two methods were similar and hence, for the simulations used here, the method of quantization by truncation is followed for the two sets of data samples taken .

Fuzzy ARTMAP results

Echocardiogram data This data set contains I 13 samples of which 30

samples decide whether a person is alive for at least one year following the heart attack and 83 samples decides whether a person is not alive for one year

VI V 2 V-.1 V4 Vs Vo V I

1II1II ,II1II ,II1II ,'" 1II1II

0 Weight Value

Fig.Sa - Truncation process

following the attack. Data sets partitions are 72/40 and 40/40/33.

Table 7 shows that increase in the accuracy of the test set prediction as the number of training epochs are increased. The value of a is set to 0 .0 I, which is a minimum value so that learning is fast. ~ are set to I . Initially, the vigilance parameter is set to a low value 0, then it is automatically increased during match tracking. If the vigilance is set to large value initially , then the number of nodes created for lesser number of training sets are high. The input data 's are converted into fuzzy membership values based on the PI function between 0 to I (Ref. I I).

Table 8 indicates that pruning yields 72 .S% training set prediction, 72.S % and 62.S test set prediction and by quantization (Q=S) the performance slightly degrades but the performance is tolerable with pruning alone, which has better accuracy than with the process of quantization .

Thus fuzzy ARTMAP takes larger training time say about 3-S epochs to gain maximum accuracy of 91 .66% which is less compared to cascade ARTMAP and the training time is also high. But when compared with the BP approach the training time is very less and the overall time taken for the execution of the samples are also very less and the rules extracted from the network are also much si mpler when compared to BP approach.

The graph for the performance of fuzzy ARTMAP for increasing number of epochs is shown in graph S.

Table 9 shows the quantized weights of Echo Cardiogram samples taken . According to this table, row 3 can be transplanted into the rule form as below:

If SM is very low to very high and SA is very high and age is very low to very high and fluid is very low to very high and short is low to very high and EPSS is very low to medium

Y2 V3 Y4 V) YO , , , , , 0

Weight Value

Fig.5b - Round off process Arrows indicate the direction of quantization.

Page 12: Design of s.oft computing models for data mining applications

118 INDIAN 1. ENG. MATER. SCL, JUNE 2000

and L VDD is low to very high and WMS is low to high Then person is like ly to be dead.

Kohonen Self- Organizing Maps12-14 The architecture and the algori thm described below

is used to cluster a set of p valued vectors x = (x, ... .... Xj ......... xn) into m clusters. The connection weights do not multiply the signal sent from the input units to the cluster units. The architecture of the Kohonen Self-Organizing Map is shown in the Fig.6.

The architecture shown has 'n' input tuples and m output tuples . Each of 'n' input nodes has individual weight vectors connecting the m output nodes as in Fig.6. Neighborhoods of the radius R = 2, I and 0 are shown in Fig.7. The winning unit is indicated by '#' and other units are denoted by'*'.

Kohonen Algorithm 13.14

Unsupervised Kohonen Algorithm Kohonen's algorithm are based upon a non­

supervised learning technique. Once trained, application of an input vector from a given class will produce excitation levels in each output neuron, i.e., the neuron with the maximum excitation represents the classification. As the training is performed with out a target vector, there is no way to predict prior to training which neuron will be associated with a given class of input vectors. However, this mapping is easily done by testing the network after it is trained .

Table 7 - Variation oftest set prediction on increasing the number of epochs

S. No. No. of epochs % Test prediction

I. I 81.994 2. 2 87.5 3. 3 91.6667 4. 5 91.6667

Table 8 - The results obtained by the partition data set

S. No. System Data set Training Prediction Test

I. 2. 3.

Fuzzy ARTMAP Prun ing Pruning + Q=5

partition accuracy accuracy accuracy (%) (%) (%)

72/40 40/40/33 40/40133

100 72.5 62.5

72.5 62 .5

81.994 62 .5

59.38

The different steps used III Kohonen SOM architecture is described below. Step 1: (i) Initialize weight W jj .

(i i) Set topological neighborhood parameters. (iii) Set learning rate parameters .

Step 2: While stopping condition is false do steps below. Step 3: For each input vector x do steps 4-6. Step 4: For each J, compute

D(j)= L (W jj -x y. Step 5: Find index J such that the distance D(J) is mmlmum. Step 6: For all units j with in a specified neighborhood of J and for all j , the weights are

updated as W ( ncw) = W .<"hJ) + a( x . __ W (IIIJ) ) 1.1 1.1 I 1.1

Step 7: Update learning rate. Step 8: Reduce the radius of topology neighborhood at specified times. Step 9: Test for the stopping condition.

The learning rate 'a' is a slowly decreasing function of time. As the network is trained, gradually the values of D and a are reduced . Kohonen recommends that a shou ld start near 1 and go down to

i l00r-----.-----,-----',-----,,-_--------~~

u",;o..... 80 - - - - - - - - -:- -:_- -_--=-=~+__--.-- -+_"7:-:-:-:-'-:-=:_ --___ -__ , .,~ " § 60 ./ I , ,

i 4D ~Z~::::::: : :::)::::::::T::::: :::C:-:::: '" ~ ~_L---~~----~----~----~----~ o ~ 40 50 00 100

Training samples

Graph 5 - Variation of test set accuracy with number of epochs

o 0 0;YY'" IV,

Vol , IV""

IV ,,,

0 0000

Fig. 6 - Kohonen architecture

Table 9 - Sample set of 4 Rules Extracted by Prun ing and Quantization Q=5 of Echo Cardiogram Samples

Predi-cation SM SA Age Cardial Fractional EPSS LVDD WMS Usage Accuracy CF nuid short

+ 1-4 5 1-2 1-5 2-5 2-5 1-4 3-5 0.125 I 0.5625 + 1-5 5 1-2 1-5 1-5 1-5 2-5 1-5 I 0.8 0.525

1-5 5 1-5 1-5 2-5 1-3 2-5 2-4 0.5 0 .3 0.5875 1-2 5 1-5 1-5 1-3 1-5 1-5 2-4 0.5 0 .3 0.65

Page 13: Design of s.oft computing models for data mining applications

SUMATHI et af.: DATA MINING APPLICATIONS 119

0.1, Where as D starts out as large as the greatest distance between weights and end up so small that only one neuron is trained.

Up to a point the classification accuracy will improve with additional training. The training algorithm, adjust the weight vectors in the vicinity of the winning neuron to be more like the input vector. The training operation then moves the cluster of nearby weight points so tha't they are closer to the input vector point. It is assumed that the input vectors are actually clustered into classes that are similar. A specific class will tend to control a specific neuron, rotating its weight vector towards the center of the class, making it more likely to be the winner when any member of that class is applied to the input.

After training, classification is performed by applying an arbitrary vector, calculating the excitation produced for each neuron and then selecting the neuron with the highest excitation as the indicator of the correct classification .

Supervised Kohonen Algorithm. Unsupervised Kohonen algorithm IS modified to

supervised Kohonen by using linear vector quantzation . The motivation for the algorithm for L VQ net is to find the output unit that is closest to the input vector. Towards the end x and w belongs to the same class, then we move the weights towards the new input vector, if x and w belong to different classes then we move the weights away from the input vector which is the concept used in LVQ. The steps used in the algorithm are given below.

The nomenclature are as follows x Training vector{ XI ,X2 . .... xn} T correct class for the training vector Cj class represented by jth output unit. Step I : (i) Initialize weight Wij . (ii) Set topological neighborhood parameters.

(iii) Set learning rate parameters. Step 2: While stopping condition is false do steps below. Step 3: For each input vector x do steps 4-6 . Step 4: For each J, compute

D(j) = L (W ij - XJ 2.

Step 5: Find index J such that the di stance D(J) is minimum. Step 6: For neighborhood updated as

ifT=CJ then

all units j with in a specified of J and for all j, the weights are

W..< new) _ W .. (Old) + rv(x . _ W ..<old) ) IJ - IJ V. 1 IJ

if T:;tCj then W

..<new) _ W ..<old) _ rv(x . _ w ..<old) ) IJ - IJ v. 1 IJ

Step 7: Update learning rate. Step 8: Reduce the radius of topology neighborhood at specified times. Step 9: Test for the stopping condition . The simplest method of initializing the weights is to take the first m vectors and use them as weight vectors, the remaining vectors are then used for training. This method supervises the classes and sets the best choice for the specified output class.

Table 10 - Test set prediction for echo cardiogram data samples

Training samples % Test set prediction Max. iteration

11 3 100 4

Table II - Rules extracted from the network for echo cardiogram dataset.

Rule 1 If 0< SM <22 Rule 211'0< SM <22 and If Still Alive is I and If Still Ali ve is 0 and If60<=AGE<90 and If 60 <= AGE < 90 and If Fluid is 0 and If Fluid is I and If 0<= Short < 0.23 and I I' 0.23<=Short<0.6 and if II<=EPSS<31 and if II <=EPSS<31 and If 5 <LVDD<7 and If 5 <LVDD<7 and If 15 <= WS < 40 and If 15 <= WS <40 and If2<WI <3 and If I <W1<2 and then Alive and then Dead

Rule 3 If 0< SM <22 Rule 4 If 0<= SM <22 and If Still Al ive is I and If Still Alive is 0 and If 60<=AGE < 90 and If 40 <= AGE < 60 and If Fluid is 0 and If Fluid is 0 and If 0.23<=Short < 0.6 and IfO < =Short<0.23 and IfO<=EPSS<11 and If 11 <= EPSS <31 and If3<LVDD <5 and If3 < LVDD <5 and If 15 <= WS <40 and If 15 <= WS < 40 and If I <WI <2 and If I <WI <2 and then Alive and then Dead

Rule 5 If22 SM <57 and If Still Alive is 0 and If 40<=AGE < 60 and I I' Fluid is 0 and If 0.23<=Short < 0.6 and IfO<=EPSS< 11 And If 3<LV DD <5 and If 8<= WS < 15 and If I <WI <2 and then Dead

Table 12 - Performance of L VQ algorithm for echo cardiogram data samples

Training epochs % Test set prediction Learning rate

56.6372 0.45 2 95.5752 0.225 3 100 0.1125 4 100 0.05625

Page 14: Design of s.oft computing models for data mining applications

120 INDIAN 1. ENG. MATER. SCI.. JUNE 2000

~ 100 ___ ---

~ m -- -. .... -: .. :..:..:-:-~~:-~-; .. . -:-:-:-:.:.:~:--:.~ .. -.. -. § ((I - - - - -/ ~<:.-. --------! --------_ ~ -________ ; __ __ ___ _ _ ~ 40 V-~ ... + .. --- .--j .-- .... -- f· -.-... . + --.----. ~ 20 '------'------'-__ --' ___ -"--__ -.J

o 20 40 60 80 100 Training samples

Graph 6 - Vari ation of test set accuracy with increase in number of Epochs in supervised kohonen architecture for echocardiogram samples

K~honen SOM results: Echocardiogram data

Unsupervised KahaneI'! A rchitecture The performance of unsupervi sed Kohonen

architecture for the ECG samples are given in Table 10 with the learning rate aCt + I ) = 0.5 * a9 to 0 after every epoch . With all the data samples trained, with maximum of 4 iterations , the test set prediction is about 100%.

The ru les extracted are of the form of IF_THEN ru les. These rules are extracted after prun ing. The rules extracted out of the network is shown in Table II. Supervised Kahanen Architecture

With supervised architecture, the network gai ns 100% accuracy with 2-3 epochs. The value of R is initially set to 3 and the gradually reduced with reducing the learning rate from 0.9 to 0.1 in steps of half its previous va lue. If its reduction rate is still lesser then it takes time for the network to get max imum accurate prediction. The performance of supervised Kohonen Architecture using linear vector quanti zation (L VQ) is shown in Table 12.

The graph for the variati on of accuracy with increase in number of epochs is shown in Graph 6. As the number of epochs increa1es the network accuracy of test pred icti on also increases.

The rules extracted from the L VQ method for heart disease samples is shown in Table 13.

Table 14 shows the performance comparison of the three methods used for si mulation.

As seen from the Table 14, it is clear th at Cascade ARTMAP perfo rm comparati ve ly better than the other systems with respect to training time and accuracy. When intermedi ate attributes are present in the data set Cascade ARTMAP architecture performs well. The rul es ex tracted out of the cascade ARTMAP network is also very clear and more easy to visualize. The graphs are simulated using MATLAB for va ri ous parameter values.

Table 13 - The rul es extracted from the supervised kohonen architecture

Rule 1 11'22<= SM < 57 Rule 2 11'0< SM < 22 and If Still Alive is 0 and If Still Alive is I and If 40<=AGE < 60 and If 40<=AGE < 60 and I I' Fluid is 0 and If Fluid is 0 and IfO .23<=ShOIt <0.6 and If 0 <= Short < 0.23 and if II<=EPSS<3 1 and if II<=EPSS<3 1 and 11' 3 < LVDD < 5 andl f 5< LVDD <7 andlfI5 <=WS < 40 and If 15 <= WS < 40 and If I < WI < 2 and If 2 < WI < 3 and then dead and then Ali ve

Rule 3 11'0< SM < 22 Rule 4 11'0< SM < 22 and If Still Alive is 0 and If Still Ali ve is I and If 60<=AGE < 90 and If 60<=AGE < 90 and If Fluid is 0 and If Fluid is 0 and If 0 <= Short < 0.23 and If 0 <= Short < 0.23 and ifO<=EPSS< 11 and i r I I <=EPSS<31 and If 5 < LVDD < 7 and 11' 3 < LVDD < S and lfIS <=WS < 40 and If IS <= WS < 40 and If I < WI < 2 and If 2 < WI < 3 and then dead and then Alive

Table 14 - Echo cardi ogram data samples

Sys tems Max. Training Max. % test set Max. rul es iterati on pred icti on extracted after

pruning

Cascade 100 5 ARTMAP Fuzzy ARTMAP 3 9 1.6667 5 Kohonen 3 100 2 arch i teet u re

Conclusion Data mining is defined as the process of

discovering meaningful new correlations, patterns, and trends by digging into large amounts of data stored in warehouse. using arti ficial intelligence, stat istical and l1lathematica l techniques . Industries that are already taking advantage of data mining include medica l. manufacturing, web, commun i­cations. aerospace and chemi ca l industri es etc. With the great use of data mining so ftware, the need for prediction of ECG data is emphasized clearl y. ARTMAP system is designed to conjointly maximize generalization and minimize predi ct ive error, under fast learning conditi ons in response to an arbitrary orderin g of input patterns. This system learn s to make accurate predicti ons Quickly, in the sense of using relatively little computing time. Efficiently in the sense of using relatively few training tri als and Flexibly in the sense that its stable learning permits continuous new le(lrning. on one or more database~

without erod ing prior knowledge, until the full memory capacity of the network is exhausted.

In general, BP neural networks learn the class ifi cati on rules by many passes over the train ing

Page 15: Design of s.oft computing models for data mining applications

SUMATHI et al.: DATA MINING APPLICATIONS 121

data set, so that learning tim~ of a neural network is long and the available domain knowledge is rather difficult to be incorporated into the neural network, whereas the percentage of test set prediction in cascade ARTMAP is higher and this is due to the fact that the initialization of network is done by inserting prior knowledge (Rule Insertion Process) . Training time also reduced comparatively with respect to Fuzzy ARTMAP and kohonen architecture. Rules extracted from the Cascade ARTMAP network is also much simpler and cleaner to understand , whereas rule extraction in BPN is a tedious process . Finally the extracted rules were validated I). As simulations are done on medical diagnosis, the rules can be used by medical experts for classification .

An in-depth analysis of the data base is done with the variation of the vigilance parameter, number of nodes and total training samples compared to the conventional one3.4·?· 14.

Comparing the three systems, cascade ARTMAP works well with a training time of only one epoch. The overall execution time is also comparitively less with respect to the time taken by the fuzzy ARTMAP system and Kohonen architecture. When ever the data base is such that the intermedi ate attributes are present then the Cascade ARTMAP structure works well. Thus among the three systems used the Cascade ARTMAP works better in the cases of medical data bases than any other systems which can be acceptable by medical experts.

Based on the simulat ion done, Cascade ARTMAP system works compariti vely better with respect to training time and accuracy. For better performance in fuzzy ARTMAP, different membership functi ons may be used for ca lculation of fuzzy membership values so that the predicti ve accuracy may be increased. For obtaining better results in Kohonen architecture the extension of LVQ namely L VQ2, L VQ3 can be used. Further more to make an unsupervised ART I system to.supervise it can be combined with a standard back propaga ti on network to improve the efficiency and training time and also Kohonen architecture can be combined with BPN to get better results. The

selection of a vigilance parameter can also be done by Genetic algorithm. The implementation work is currently in progress. References

I Hongjun Lu , Rudy Sectiono & Hu an Liu , IEEE Trails , Knowledge Data Eng, 8 (1996).

2 Agrawal R, Imielinsk i T & Swami A, IEEE TrailS Kn owledge Data Ell g .. 5 (1996) 914. .

3 Tan Ah Hwee Kllowledge Proc IEEE TrailS Nellral Nf' l\Vorks, 8 ( 1997) 237.

4 Carpenter Gail A, Grossberg Stephen and Reynolds John H, Nellral Networks. 4 ( 1991) 565.

5 Carpenter Gail A & Tan A H, in Proc World COIlRress 0 11

Nellral Networks, I ( 1993) 50 I. 6 Hayashi Yoichi, Adv Nn ell ra l Inforlll Process Syst . 3 (1990)

578. 7 VCI Repositol)' of Machine Learning Datahases /Machille

readable data repository! flp site flp :// flp ics uci edu/pub/maehine- Iearning - data bases.

8 Setiono H Lu R & Liu H. Nellro Rille: A Connectionist Approach to Data Mining, Proc VLDB '95 . (1995).478.

9 Carpenter Gail A. Grossberg S. Marku zo n N. Reynolds J H & Rosen D B. IEEE Trans Nell ral Networks. 3 (1992) 698.

10 Carpenter Gai l A, Grossberg S & Rosen 0 B. Nell ra I Networks, 4 (199 1) 565 .

II Mitra Sushmita, De Rajat K & Pal Sankar K, IEEE Trans Nellral Networks, 8 (1997) 1338.

12 Fausett Laurene, Fundamen tals of Neura/ Netwo rks. (Prenticc Hall Inc. New York), 1994.

13 Kohenon T, Th e selfOrgallisillg Map Pmc IEEE, 78, ( 1990) 1464.

14 Kangas J A, Kohenon T & Laaksonen J M, IEEE TrailS Neural Networks, I ( 1990) 93.

15 Sumathi S & Priyadarshini Balachandar, . Self organised Il eural Il etwork schemes: As a data lIIilling tool ' Project Report , PSG coll ege of Technology, Coimbatore. 1999.

16 Sumathi S & Sivanancbm S N "A ll ill/proved self orgall ised map for data milling of Heart disease prohlem " in the 30'" Mid Term SYIllP Data Warehol/sillg alld Data Millillg Applicatiolls , C E E R I, Pil ani. (Rajas than ) ( 1999).

17 Sumathi S, Sivananda m S N & Rajesh S P. Nellra l techlliqlles fo r Data mining, publi shed in the Illternat ional conference all

Advallces ill Compll tillg, ADCOMP'98 Pune. 1998. 18 Sumathi S, Sivanandam S N & Jagadeeswari , Data lIIillillg of

Heart disease data II sillg Nellral Com/Jllta tions 5'" Illte Mendel Conf Soft COIII/lli/ ing, Bmo. Czech Republic. 1999.

19 Sumathi S & Jagadeeswari. Self Organ ised Il el/ ral Il e/work schemes fo r Data mining, Project repo rt , PSG Co llege of Technology, Coimbatore, ( 1999).

20 Sumathi S & Sivanandam S N. Design and developmen t of SOM Models for Data Mining Applications, IETE J Res Knowledge Data Eng (accepted).