15
Research Article Hybrid Modified -Means with C4.5 for Intrusion Detection Systems in Multiagent Systems Wathiq Laftah Al-Yaseen, 1,2 Zulaiha Ali Othman, 1 and Mohd Zakree Ahmad Nazri 1 1 Data Mining and Optimization Research Group (DMO), Centre for Artificial Intelligence Technology (CAIT), School of Computer Science, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), 43600 Bandar Baru Bangi, Malaysia 2 Al-Furat Al-Awsat Technical University, Iraq Correspondence should be addressed to Wathiq Laſtah Al-Yaseen; [email protected] Received 21 April 2015; Accepted 2 June 2015 Academic Editor: Nirupam Chakraborti Copyright © 2015 Wathiq Laſtah Al-Yaseen et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Presently, the processing time and performance of intrusion detection systems are of great importance due to the increased speed of traffic data networks and a growing number of attacks on networks and computers. Several approaches have been proposed to address this issue, including hybridizing with several algorithms. However, this paper aims at proposing a hybrid of modified -means with C4.5 intrusion detection system in a multiagent system (MAS-IDS). e MAS-IDS consists of three agents, namely, coordinator, analysis, and communication agent. e basic concept underpinning the utilized MAS is dividing the large captured network dataset into a number of subsets and distributing these to a number of agents depending on the data network size and core CPU availability. KDD Cup 1999 dataset is used for evaluation. e proposed hybrid modified -means with C4.5 classification in MAS is developed in JADE platform. e results show that compared to the current methods, the MAS-IDS reduces the IDS processing time by up to 70%, while improving the detection accuracy. 1. Introduction With the growing demand for the services provided by net- works, the availability, confidentiality, and integrity of critical information has become increasingly at risk from misuse [13]. Firewall systems alone provide insufficient protection from unwanted access to this important information due to their inability to protect networks from intruders using open ports [46]. Intrusion Detection System (IDS) is one of the system security infrastructures attempting to detect malicious activities, such as denial of service attacks and port scans, by monitoring and analyzing events occurring on net- works and computers [1, 7]. In terms of intrusion detection, IDS can be classified as either host-based or network-based. e host-based IDS (HIDS) observes the behavior and state of the computer activities and detects the programs that can gain access to resources. On the other hand, the network- based IDS (NIDS) is monitoring the network traffic (traffic volume, service ports, IP addresses, and protocol usage) and analyzes it to identify suspicious activities [810]. In general, IDS can be implemented using two approaches: rule- based detection and anomaly-based detection [1, 10]. Rule- based detection (also known as misuse or signature-based detection) searches for specific signature patterns previously stored in the rules database. Snort is one of the popular approaches used in its work to detect intrusions based on rules [11]. e disadvantage of rule-based detection is inability to detect new attacks, as these have no signatures in the database [4]. us, rule-based detection will increase the percentage of false negative results. On the other hand, the anomaly-based detection approach constructs models of all normal activities through the observed data and then alerts of any behavior or activity that deviates from this model [12]. e main advantage of anomaly-based detection stems from its capability to detect novel attacks, which are different from the already learned attacks. However, its drawback is the increased likelihood of classifying normal behavior as attacks, thus increasing the false positive rate [13]. Hindawi Publishing Corporation e Scientific World Journal Volume 2015, Article ID 294761, 14 pages http://dx.doi.org/10.1155/2015/294761

Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

Research ArticleHybrid Modified 119870-Means with C45 for IntrusionDetection Systems in Multiagent Systems

Wathiq Laftah Al-Yaseen12 Zulaiha Ali Othman1 and Mohd Zakree Ahmad Nazri1

1Data Mining and Optimization Research Group (DMO) Centre for Artificial Intelligence Technology (CAIT)School of Computer Science Faculty of Information Science and Technology Universiti Kebangsaan Malaysia (UKM)43600 Bandar Baru Bangi Malaysia2Al-Furat Al-Awsat Technical University Iraq

Correspondence should be addressed to Wathiq Laftah Al-Yaseen banenwathiqyahoocom

Received 21 April 2015 Accepted 2 June 2015

Academic Editor Nirupam Chakraborti

Copyright copy 2015 Wathiq Laftah Al-Yaseen et alThis is an open access article distributed under theCreativeCommonsAttributionLicense which permits unrestricted use distribution and reproduction in anymedium provided the originalwork is properly cited

Presently the processing time and performance of intrusion detection systems are of great importance due to the increased speedof traffic data networks and a growing number of attacks on networks and computers Several approaches have been proposedto address this issue including hybridizing with several algorithms However this paper aims at proposing a hybrid of modified119870-means with C45 intrusion detection system in a multiagent system (MAS-IDS) TheMAS-IDS consists of three agents namelycoordinator analysis and communication agent The basic concept underpinning the utilized MAS is dividing the large capturednetwork dataset into a number of subsets and distributing these to a number of agents depending on the data network size and coreCPU availability KDD Cup 1999 dataset is used for evaluation The proposed hybrid modified 119870-means with C45 classificationin MAS is developed in JADE platform The results show that compared to the current methods the MAS-IDS reduces the IDSprocessing time by up to 70 while improving the detection accuracy

1 Introduction

With the growing demand for the services provided by net-works the availability confidentiality and integrity of criticalinformation has become increasingly at risk from misuse[1ndash3] Firewall systems alone provide insufficient protectionfrom unwanted access to this important information dueto their inability to protect networks from intruders usingopen ports [4ndash6] Intrusion Detection System (IDS) is oneof the system security infrastructures attempting to detectmalicious activities such as denial of service attacks and portscans by monitoring and analyzing events occurring on net-works and computers [1 7] In terms of intrusion detectionIDS can be classified as either host-based or network-basedThe host-based IDS (HIDS) observes the behavior and stateof the computer activities and detects the programs that cangain access to resources On the other hand the network-based IDS (NIDS) is monitoring the network traffic (trafficvolume service ports IP addresses and protocol usage)

and analyzes it to identify suspicious activities [8ndash10] Ingeneral IDS can be implemented using two approaches rule-based detection and anomaly-based detection [1 10] Rule-based detection (also known as misuse or signature-baseddetection) searches for specific signature patterns previouslystored in the rules database Snort is one of the popularapproaches used in its work to detect intrusions basedon rules [11] The disadvantage of rule-based detection isinability to detect new attacks as these have no signatures inthe database [4] Thus rule-based detection will increase thepercentage of false negative results On the other hand theanomaly-based detection approach constructs models of allnormal activities through the observed data and then alertsof any behavior or activity that deviates from this model[12] The main advantage of anomaly-based detection stemsfrom its capability to detect novel attacks which are differentfrom the already learned attacks However its drawback is theincreased likelihood of classifying normal behavior as attacksthus increasing the false positive rate [13]

Hindawi Publishing Corporatione Scientific World JournalVolume 2015 Article ID 294761 14 pageshttpdxdoiorg1011552015294761

2 The Scientific World Journal

Most researchers studying these issues focused on theaccuracy of detection attacks by using differentmethods suchas neural networks [14 15] fuzzy logic [16 17] and machinelearning [18ndash20] thus dedicating less attention to the pro-cessing time required to detect attacks However processingspeed is becoming more important due to the increasingnetwork traffic and the need to achieve the objective ofintrusion detection systems as real time systems

One of the popular methods for reducing processingtime is distributed artificial intelligence (DAI) that mergesartificial intelligence with distributed computing [21] Mul-tiagent system (MAS) is one of the DAI branches Severalauthors have proposed agents aimed at improving the IDSperformance For example JAM [22] used metalearningdistributed datamining to build classifiers at difference nodesfor data analysis This system has a manager responsible forcoordinating simultaneous execution of classifiers by agentsIt subsequently combines the results of all classifiers byusing one of the Meta learning techniques On the otherhand DIDMA [23] uses two types of agents (1) static agentresponsible for collecting the information about attacks fromits host and (2) mobile agent responsible for gathering fromstatic agents the information pertaining to new attacks on thesystem However in these works agents are applied in IDS asa conceptual idea without demonstrating their performance

Over the years various machine learning techniqueshave been proposed with their authors claiming that theirperformance is best suited for IDS [10] 119870-means and deci-sion trees are among these techniques that are used widelyin designing IDS 119870-means is used to cluster the data tofind the meaningful structures or patterns in a collectionof unlabeled data so that the instances in the same clusterare similar while the instances from different clusters aredifferent from each other Several extant studies [24ndash29]presented 119870-means as a single algorithm for clustering theIDS data into a set of clusters representing normal processesand attacks Furthermore different authors [30ndash34] havepresented combined methods depending on 119870-means andother techniques which were used to build the model of IDSOn the other hand C45 is used to build a tree structure ofattack signatures as well as constructing the tree structureof normal behaviors This approach depends on maximuminformation gain in the feature selection criterion and mini-mal information split into building the tree structure In someapproaches C45 is used as a singlemodel to constructmisusedetection only [35 36] while other researchers combinedC45 with other techniques to build IDS as a layered model[37] Finally in some studies [38ndash41] C45 was evaluated andcompared with other techniques in order to demonstrate itsperformance However combining 119870-means with decisiontrees has resulted in good IDS accuracy For example in arecent study [30] C45 technique was used with 119870-means todesign a supervised anomaly detection system while otherresearch group [42] used ID3 combined with 119870-means in anovel supervised anomaly detection approach

This study proposes a hybrid modified 119870-means withC45 algorithms to build IDS in a multiagent system environ-ment The aim of this approach is to improve the anomaly-based detection accuracy while the MAS is required to

reduce the IDS processing time In the proposed designMAS utilizes three types of agents coordinator analysis andcommunication agent The coordinator agent is responsiblefor building trees of the training dataset through the useof modified 119870-means to cluster the data into a number ofclusters It subsequently uses the C45 technique to buildthe tree for each cluster The resulting trees exhibit highefficiency in classifying the testing data because each tree isbuilt from similar instances as attributesThus it is possible todiscriminate between classes with high accuracy Moreoverthese trees will reduce the processing time as search isperformed on a smaller tree The second task of the coor-dinating agent is dividing the testing dataset into a numberof subsets Moreover this agent will send every subset oftesting data with the training trees to one of the analysisagents responsible for analyzing it Lastly the coordinatoragent combines the results yielded by the analysis agentsin order to obtain the final results On the other hand theanalysis agent is responsible for analyzing the data receivedfrom the coordinator agent It is using the closest decision treeto classify each instance of the dataset into the appropriateclass Finally the communication agent is responsible fortransferring the data and results between coordinator agentsand analysis agents The modification of 119870-means is themethod adopted for choosing the initial centroids of clustersThis work will significantly reduce the processing time thusincreasing the IDS efficiency KDD Cup 1999 dataset is usedto evaluate the performance of the proposed system andJADE platform is used for its implementation

The remainder of this paper is organized as followsSection 2 provides a brief review of the related work on 119870-means C45 andmultiagent systems with IDSThe proposedsystem is described in Section 3 while Section 4 presentsexperimental results in order to demonstrate the proposedsystem performance The concluding remarks are given inSection 5

2 Related Work

This section provides a detailed description of the role ofmultiagent system in IDS and discusses the role of 119870-meansand C45 algorithms in building the IDS models Extantstudies have confirmed that C45 technique can achieve betterclassification performance In addition 119870-means has highability to group the data into clusters where the instances insame cluster have high similarity

21 Multiagent Systems (MAS) Various multiagent systems(MASs) for IDS have been proposed [43ndash47] Dasgupta etal [43] for example developed the Cougaar framework andpresented hierarchical architecture consisting of four differ-ent agents (manager monitor decision and action agent)The authors used intelligent decision support modules suchas fuzzy inference system to detect anomalies at the packetprocess user and system level This work however failsto explicate how multiple security nodes of CIDS shouldbe organized when large numbers are needed to protectmany hosts in larger networks Moreover the authors do

The Scientific World Journal 3

Input Dataset 119896Output Clusters(1) Select 119896 initial centroids of clusters randomly(2) Assign every instance 120596

119894isin Dataset to the closest centroid to make 119896 clusters 119862

1 1198622 119862

119896

(3) Calculate cluster centroids 120596119894= (1119896

119894) sum119896119894

119895=1120596119894119895 119894 = 1 119896

(4) For every instance 120596119894isin Dataset Do

(41) Reassign 120596119894to closest cluster centroid 120596

119894isin 119862119904is moved from 119862

119904to 119862119905

If 120596119894minus 120596119905 le 120596

119894minus 120596119895 forall119895 = 1 119896 119895 = 119904

(42) Recalculate centroids for clusters 119862119904and 119862

119905

(5) If cluster instances are stabilizedThen stop Else go to Step (4)

Pseudocode 1 Pseudocode of the standard 119870-means algorithm

not discuss how and what information the manager agentsshould share The architecture of multiagent flow-based IDSwas developed in a different study [44] where the concept ofreputation system was used to permit agents to find nodesthat are most effective for classifying malicious networkactivity Zhu et al [45] presented MAIIDS using more thanone technique such as neural network association rules forlearning agents and generating rules for decision agents todetect the audit data according to these rules and respond tothemThe experimental results indicate that their system hasvery high self-adapting ability intelligence and expansibilityEl Ajjouri et al [46] presented architecture based on addinga learning feature whereby abnormal behaviors correspondto unknownmalicious patternsThis architecture first detectsnew attacks using the agent responsible for detecting the newbehavior after which it updates the basic attack patternsThis agent used case-based reasoning (CBR) technique asthe attack detection method Yang et al [47] presenteddistributed agent model dependent on artificial immunesystems (AIS) for building IDSThis system takes the featuresof AIS such as self-adapting self-learning self-organizingparallel processing and distributed coordinating Althoughthiswork includes a section on empirical findings the authorsdo not provide any details about the experiments they haveconducted Therefore it is not possible to ascertain thesignificance of their results As can be seen from abovenone of the authors of extant works on agent-based IDSdiscussed or presented the results of processing time clearlyThis shortcoming is addressed in the present study where oneof the measurements that evaluate the performance of IDSbased on MAS requires computing the IDS processing time

22 119870-Means Algorithm-Based IDS 119870-means algorithm isthe method that clusters groups of objects into 119896 disjointclusters based on their attributes [48]The objects in the sameclusters are similar while those from different clusters differfrom one another This algorithm uses one of the similaritymeasures to compute the distance between two objects Themeasure most commonly used by 119870-means is the EuclideandistanceThe advantage of119870-means is its flexibility in dealingwith large datasets [25] with the time complexity 119874(119905119896119899)where 119905 represents the number of iteration times 119896 denotesthe number of clusters and 119899 is the number of datasetinstances However the main disadvantage of the 119870-means

algorithm is the need to find the best number of clusters 119896In addition it is sensitive to the isolated dataset instances[25] and the algorithm converges finitely to local minimaConsequently the initial centroids of clusters significantlyaffect the 119870-means algorithm output The pseudocode ofstandard119870-means is shown in Pseudocode 1 [49] where

1003817100381710038171003817119909 minus1199101003817100381710038171003817 =

radic

no Attributessum

119894=1(119909119894minus 119910119894)2 (1)

In the past many ideas were intended to improve theperformance of 119870-means Most of these methods aimed atimproving the method used for selecting the initial centroidsof clusters For example Ball and Hall [50] adopted thecentroid of the dataset as the first centroid that is 1198831015840 =

1119873 lowast sum119873

119895=1119909119895 before choosing the remaining centroids

in arbitrary fashion if the distance between them andpreviously selected centroids is greater than the thresholduntil 119896 centroids are obtained Maximin method developedby Katsavounidis et al [51] on the other hand chooses thefirst centroid arbitrarily while the subsequent centroids (119896minus1)are chosen as instances that have the greatest minimum-distance with respect to the previously selected centroidsAl-Daoudrsquos variance-based method [52] sorts the instancesof data depending on the variance in attributes and thenpartitions them into 119896 groups with the same dimension(the medians of these groups are chosen as centroids)The 119896-means++ method [53] combines 2th MacQueen withMaximin method to select the first centroid randomly andthe 119894th (119894 isin 2 3 119896) centroid is chosen as an instancewith probability md(instance1015840)2sum119873

119895=1md(instance

119895)2 where

md(119909) denotes the minimum-distance from the previouslyselected centroids Erisoglu et al [54] proposed amethod thatfirst chooses two main vectors representing the best datasetdistribution before computing the centroid of the datasetas a mean of these two vectors The first cluster centroid isthe instance with the longest Euclidean distance from thecentroid of the dataset while the 119894th cluster centroid is theinstance with the maximum combined distance from theprevious (119894 minus 1) cluster centroids

23 Hybrid 119870-Means-Based IDS The best possible highdetection rate and low false alarm rate can be achieved by

4 The Scientific World Journal

using hybrid approaches for IDSHybrid119870-means combinedwith other techniques played an important role in thisfield Xiao et al [31] proposed a 119870-means algorithm basedon PSO for network anomaly detection The authors usedPSO to solve the problem of local convergence minimumof 119870-means capitalizing on the PSOrsquos global search abilityExperimental results with a KDD dataset demonstrate thatthe proposedmethod is effective in dealingwith large datasetsand achieves a satisfactory detection rate Yongzhong et al[55] also proposed a similar PSO-119870-means hybrid systemMuda et al [32] proposed a hybrid learning approach througha combination of119870-means clustering and Naıve Bayes classi-ficationThe authors clustered all data into the correspondinggroup before applying a classifier for classification purposeTheir results show that the proposed approach achievedreasonable false alarm rate A clustering algorithm that usesSOM and 119870-means for intrusion detection was proposedby Wang et al [33] When the SOM finishes its trainingprocess119870-means is adopted to refine theweights obtained bytraining In addition once SOM completes cluster formation119870-means is applied to refine the final clustering results Chan-drasekhar and Raghuveer [34] proposed a new approachbased on fuzzy neural network and support vector machineto improve the IDS detection rate Here 119870-means clusteringwas first applied to generate different training data subsets

However Muniyandi et al [30] proposed an anomalydetection method using 119870-means combined with C45 forclassifying anomalous andnormal activities In this approach119870-means clustering is used initially to partition the trainingdataset into 119896 clusters using Euclidean distance Then thedecision tree is built for each cluster using the C45 techniqueand the rules created by the decision tree are used todetect intrusion events The testing phase is implementedthrough two steps In the first step the Euclidean distanceis computed for every testing instance before finding theclosest cluster Therefore the decision tree corresponding tothe closest cluster is selected to detect the class of the instanceIn this work 119870-means still have some shortcomings asthe clustering output mostly depends on the selection ofthe initial centroids of clusters In addition the numberof clusters 119896 needs to be given in advance Moreover theresulting clusters do not include all the possibilities of classinstances

3 Proposed Hybrid Modified 119870-Means withC45 in MAS-IDS

TheproposedMAS-IDS uses three agents that are responsiblefor achieving the IDS goals coordinator analysis and com-munication agent TheMAS-IDS system is shows in Figure 1The details of the proposed system are elaborated on in thenext sections

31 Multiagent System-Based IntrusionDetection System (MAS-IDS)

311 Coordinator Agent The deliberative coordinator agentuses a training dataset to train the hybrid system through

Host 1

Communication agent

Coordinatoragent

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Host 2

Communication agent

Coordinatoragent

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Host N

Communication agent

Coordinatoragent

Network traffic

Communication between agentsShare information between agentsCapture network traffic

middot middot middot

Figure 1 The MAS-IDS architecture

constructed clusters by using modified 119870-means It subse-quently applies the C45 technique on each cluster to buildthe decision trees that will be used in the testing phase Onthe other hand the coordinator agent receives and dividesthe gathering traffic data network into a number of subsetsby applying (2) Therefore it sends these subsets with thetrees and centroids of clusters to the analysis agents in theother hosts by using communication agents At the same timethe coordinator agent has information about all the hosts ofthe system environment where each host periodically sendsthe number of its cores that are presently not busy to thecoordinator agent The scenario in which the coordinatoragent operates can be summarized in the following steps

(1) Read the training dataset(2) Call modified119870-means (Pseudocode 2) to cluster the

training dataset into a set of clusters(3) Build the tree for each cluster by using the C45

technique(4) Send the trees and centroids of clusters to static agents

in the other hosts(5) Capture traffic network data packets(6) Specify the number of core CPUs in hosts of the

system that are presently not busy (assume 119899)(7) Divide the captured data into 119899 subsets refer to (2)(8) Create 119899 analysis agents in the other hosts refer to (3)(9) Send the 119899 data subsets to the analysis agents by using

the communication agent(10) Wait until all analysis agents finish analyzing data

The Scientific World Journal 5

Input DatasetOutput Clusters Centroids(1) Set 119896 = 1 119888

1= First instance 120596

1

(2) For every instance 120596119894isin Dataset and 119894 = 1 Do

(21) If 120596119894minus 119888119904 gt 119905ℎ119903119890119904ℎ119900119897119889 119904 = 1 119896Then

(22) 119896 = 119896 + 1 119888119896= 120596119894

(3) Assign every instance 120596119894isin Dataset to the closest centroid to make 119896 clusters 119862

1 1198622 119862

119896

(4) Calculate cluster centroids 120596119894= (1119896

119894) sum119896119894

119895=1120596119894119895 119894 = 1 119896

(5) For every instance 120596119894isin Dataset Do

(51) Reassign 120596119894to closest cluster centroid 120596

119894isin 119862119904is moved from 119862

119904to 119862119905

If 120596119894minus 120596119905 le 120596

119894minus 120596119895 forall119895 = 1 119896 119895 = 119904

(52) Recalculate centroids for clusters 119862119904and 119862

119905

(6) If cluster instances are stabilized then stop Else go to Step (4)

Pseudocode 2 Pseudocode of the modified 119870-means algorithm

(11) Combine the results yielded by the analysis agentsusing (4)

Data set = 1198781 1198782 119878119895 119878119899 (2)

subject to

119878119895= set of Instances isin Data set each Instance notin

119878119894

forall119894 = 1 119899 and 119894 = 119895|119878119895| = |Data set|119899 119895 = 1 119899

where 119899 le core CPUs available in the system

312 Analysis Agent A set of reactive analysis agents iscreated in the other hosts within the system environment byusing (3) where the number of analysis agents is equal to thenumber of subsets resulting from the splitting process Eachanalysis agent receives one subset of testing data along withthe centroids of clusters and decision trees that have beencreated in the training phase by the coordinator agent Infact the coordinator agent sends amessage about the numberof agents needed to the deliberative agent resident in eachhost Thus the resident agent creates these analysis agentsUnfortunately in JADE if the coordinator agent is creatingthe analysis agents directly in the other hosts the analysisagents are logically created in these hosts However physi-cally these agents are created in the host of the coordinatoragent Consequently the analysis agents will be using samethe core CPU and memory as the coordinator agent hostEach analysis agent is running as a thread by using one of thecore CPUs on that host [56] Hence if this host has four coresit can simultaneously run four threads of agents in parallel

forall119878119895 create AA

119895isin Host

119894997888rarr analysis (119878

119895AA119895)

119895 = 1 119899 1 le 119894 le 119898

(3)

subject to

number of AA in Host119894le number of core CPUs in

Host119894

where AA119895represents the analysis agent 119895 analysis (119878

119895AA119895)

represents the analysis function used to analyze subset 119878119895by

analysis Agent AA119895 and 119898 represents the number of hosts

available in the system environmentIn the analysis agent each instance is first tested with

the closest centroid of clusters after which the decision treecorresponding to this centroid is used to determine the typeof instance If the instance attributes do not match any classfrom the decision tree this instance is treated as attack andthe decision trees are updated with the data pertaining to thisattack to assist with future detection The scenario in whichthe analysis agent operates is as follows

(1) Receive the subset data centroids of clusters and treesfrom the communication agent

(2) Call the pseudocode (Pseudocodes 3 and 4) to analyzethe data

(3) Return the results to the coordinator agent by usingthe communication agent

Finally the coordinator agent combines all the resultsproduced by the analysis agents by using (4) to provide thefinal results to the system administrator At this time thesystem administrator will raise an alert to deal with thissituation

Normal instances =119899

119894=1

Normal (AA119894)

Attack instances =119899

119894=1

Attack (AA119894)

(4)

313 Communication Agent The communication agent isresponsible for transferring data and results between agentsThe scenario in which the communication agent operates ispresented through the following steps

(1) Receive datasets centroids of clusters and decisiontrees from the coordinator agent

(2) Move the above from the coordinator agent host tothe analysis agent host

6 The Scientific World Journal

Input Testing Dataset centroid treesOutput Predication of instances testing dataset P(1) For every instance 120596

119894isin testing dataset Do

(11) Choose centroid 119888119895 120596119894minus 119888119895 lt 120596

119894minus 119888119904 for all 119878 = 1 119896 and 119878 = 119895

(12) 119875 = add Call pseudo code MatchTree (120596119894 119903119900119900119905119895)

Pseudocode 3 Pseudocode of the determination the closest centroid for an instance

Input instance 120596 rootOutput Class(1) If rootbranch = 0Then return rootvalue(2) Choose attribute 119886

119894isin 120596 corresponding index of root

(3) For all branchValues isin root Do(31) If 119887119903119886119899119888ℎ119881119886119897119906119890

119895= 119886119894exists Then Call pseudo code MatchTree (120596 119903119900119900119905119887119903119886119899119888ℎ

119895)

(32) return ldquounknownrdquo

Pseudocode 4 Pseudocode of MatchTree

(3) Give the dataset centroids of clusters and decisiontrees to the analysis agent

(4) Receive the results from the analysis agent

(5) Move the results from the analysis agent host to thecoordinator agent host

(6) Provide the results to the coordinator agent

32 Modified 119870-Means Algorithm The main advantage ofmodified 119870-means that distinguishes it from other adjusted119870-means in the extant literature is its ability to consider allpossible eventualities by treating all the divergent points inthe dataset as initial centroids of clusters rather than selectinga specific set of initial centroids randomly as is typicallydone In other words modified 119870-means constructs clusterswith all the cases characterized by significant differencesamong instancesThus modified119870-means will distribute thedataset instances to convenient clusters with best accuracyHowever unlike other adjusted119870-means in the modified119870-means approach determining the number of clusters 119896 is notrequired as this is done dynamically The main differencebetween the modified and the standard 119870-means is in theselection of initial centroids of clusters as shown in thefollowing steps

(1) Select the first centroid of the cluster as the firstinstance of the dataset

(2) Select the instance with the distance from all the pre-viously selected centroids greater than the specifiedthreshold (best threshold = 4000 first experiment) asthe next centroid

(3) Repeat Step (2) to reach to the end of the dataset

(4) Apply the other steps of standard 119870-means on theselected initial centroid of clusters

Pseudocode 2 shows the pseudocode of modified 119870-means Our modification of 119870-means is evident in Steps (1)and (2) of the pseudocode

33 C45 Algorithm After distributing the training datasetinstances among the clusters by usingmodified119870-means thestandard C45 technique developed by Quinlan [57] is usedto build the trees from clusters whereby C45 builds tree foreach cluster More details and the pseudocode of C45 canbeen found elsewhere [58 59]

34 Testing Phase This phase is implemented by the analysisagent and is executed in two stages to test the traffic datanetwork In the first stage the closest centroid of testinginstance is chosen (the pseudocode of this stage is shown inPseudocodes 3) In the second stage the subtree correspond-ing to the centroid chosen in the first stage is implemented inorder to test the instance and identify the appropriate class forthis instanceThe pseudocode of the second stage is shown inPseudocodes 4

4 Experimental Setup and Analysis

We used the benchmark KDD Cup 1999 [60] to evaluatethe MAS-IDS performance In most of the previous worksin this field the authors used cross-validation such as 10-fold for evaluation Cross-validation was based on using thesame classes of training data without adding new classesin the testing stage Thus these works could achieve highperformance in terms of accuracy and detection rate Onthe other hand the strength of IDS stems from its ability todetect unknown attacks (new attacks) The KDD Cup 1999dataset consists of two datasets 10 KDDCUP dataset (usedfor training) and Corrected dataset (employed in testing)More details about KDD Cup 1999 can be found in extantliterature [61] Among the available performance measures

The Scientific World Journal 7

accuracy (Acc) detection rate (DR) and false alarm rate(FAR) are most popular when aiming to evaluate the MAS-IDS performance

Acc = TP + TNTP + TN + FP + FN

DR =TP

TP + FN

FAR =FP

TN + FP

(5)

The computers used to implement the experiments areequipped with Core-i7 340GHz with 8 core CPUs and6GB RAM The OS is Windows 7 professional 64 bitsThe experiment was conducted in JADE platform and wasimplemented using JAVA programming

Table 1 shows the details of datasets used to evaluate theMAS-IDS performance along with the conventional method(hybrid standard 119870-means with C45) and other techniquesIt should be noted that training datasets (trainDS1 trainDS2trainDS3 and trainDS4) were generated randomly from 10KDDCUP dataset while testing datasets (testDS1 testDS2testDS3 and testDS4) were generated randomly from Cor-rected dataset

The preprocessing for the symbolic attributes is achievedThe three symbolic attributes are protocol service and flagthat convert to numeric values such as protocol attributeThe three values tcp udp and icmp are converted to 1 2and 3 respectively and the same approach is adopted for theremaining attributes

In this study three experiments were carried out Inthe first experiment the best value of the threshold wascomputed while the MAS-IDS performance was evaluatedin the second experiment by comparing the results yieldedby MAS-IDS with those obtained through the conventionalmethod and other techniques available in Weka and MatlabIn the third experiment we compared the processing timerequired by MAS-IDS with that of hybrid modified119870-meanswith C45 in nonagent environment

41 Identifying the Best Threshold for Modified119870-Means Themodified119870-means requires a predetermined threshold valueto select the initial centroids of clusters In this experiment alltraining datasets in Table 1 are used with testDS1 to computethe average accuracy for different values (1000ndash10000) Thevalue that yields the highest accuracy is thus chosen as thethreshold for modified 119870-means As can be seen in Figure 2the threshold value is 4000 as it results in an average ofaccuracy of 090155 We used all the training datasets withonly one testing dataset to choose the threshold value becausethe modified 119870-means approach is applied only on thetraining dataset to construct the clusters In all subsequentexperiments the chosen threshold (4000) is employed withhybrid modified119870-means and C45

42 MAS-IDS Performance In order to compare MAS-IDSwith the hybrid standard 119870-means and C45 [30] the bestvalue of 119896 for 119870-means is identified Typically 119870-means is

Table 1 The details of evaluation datasets

Dataset Normal DoS Probe R2L U2R TotaltrainDS1 900 1000 300 500 300 3000trainDS2 1100 1300 300 800 500 4000trainDS3 1500 1800 400 1000 300 5000trainDS4 1800 1800 500 1100 800 6000testDS1 5000 3000 700 900 400 10000testDS2 10000 7000 1000 1500 500 20000testDS3 15000 10000 1500 2500 1000 30000testDS4 20000 14000 1500 3000 1500 40000

088408860888

0890892089408960898

0909020904

0 2000 4000 6000 8000 10000

Accu

racy

Threshold

Figure 2 Computing the best threshold value for modified 119870-means

0010203040506070809

1

0 10 20 30 40 50 60 70 80 90 100 110

Perfo

rman

ce

k

AccuracyDRFAR

Figure 3 The performance of hybrid standard 119870-means and C45

run independently for different values of 119896 and the partitionthat appears the most meaningful to the domain expertis selected [62] Figure 3 shows the performance of hybridstandard 119870-means with C45 for different 119896 values (119896 =

10 20 30 100) The best 119896 value is equal to 10 because ityields the highest accuracy (9067) and detection rate (8480)As can be seen only the false alarm rate percentage (346) isnot the most optimal as 21 is achieved when 119896 = 100 Thusin all subsequent experiments we adopt 119896 = 10 as the bestnumber of clusters

8 The Scientific World Journal

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 4 ROC curve for testDS1

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 5 ROC curve for testDS2

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 6 ROC curve for testDS3

The ROC curves in Figures 4 5 6 and 7 show theproposed method performance in comparison with hybrid119870-means with C45 [30]

According to the ROC curves for the proposed hybridmodified119870-means with C45 in MAS-IDS it achieved better

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 7 ROC curve for testDS4

Table 2 The accuracy results of the MAS-IDS versus hybrid 119870-means and C45 [30]

Training dataset Testing dataset MAS-IDS Hybrid 119870-meansand C45

trainDS1

testDS1 09031 08993testDS2 091745 0916021testDS3 09126 0909633testDS4 09147 0911625

trainDS2

testDS1 08874 08742testDS2 09107 09008testDS3 0904367 0894967testDS4 090545 089475

trainDS3

testDS1 09058 08932testDS2 092225 09135testDS3 091667 09049testDS4 0916125 090805

trainDS4

testDS1 09099 08963testDS2 09215 090815testDS3 09155 09016testDS4 0918025 090625

119901 value 000000028

results in comparison with the conventional method The 119905-test shows that theMAS-IDS significantly improved accuracywith 119901 value lt 005 (000000028) Therefore the MAS-IDSwas tested by computing the classification results pertainingto each training dataset using all testing datasets presentedin Table 1 Table 2 shows the comparison accuracy betweenMAS-IDS and hybrid 119870-means with C45 Table 3 showsthe average results of the MAS-IDS evaluation along withthe comparison with the conventional method and othermethods fromWeka and Matlab

As can be seen from Table 3 the MAS-IDS approachachieves higher accuracy and detection rate as well as119865-measure However the false alarm rate precision andspecificity are not superior to those achieved by the othermethods especially the Decision Table which produces

The Scientific World Journal 9

Table 3 Comparison of the MAS-IDS performance with other methods using different measures

Method Accuracy DR FAR Precision Specificity 119865-measureMAS-IDS 09113 08526 00299 09665 09701 09056Hybrid 119870-means and C45 (2012) 09021 08394 00353 09600 09647 08954Bayes Net 09017 08177 00142 09829 09858 08926Naıve Bayes 08150 07727 01427 08578 08573 08076SMO 08805 07785 00174 09781 09826 08664IBk 08886 07962 00190 09766 09810 08771J48 08513 07298 00273 09638 09727 08299NBTree 09007 08096 00081 09900 09919 08903Decision Table 08306 06631 00020 09970 09980 07956JRip 08377 06983 00229 09682 09771 08111LibSVM 07964 08120 02191 08169 07809 08068

0010203040506070809

1

Perfo

rman

ce

AccuracyDRFAR

MA

S-ID

S

Hyb

rid K

-mea

ns

and

C45

Baye

s Net

Naiuml

ve B

ayes

SMO

IBk

J48

NBT

ree

Dec

ision

Tab

le

JRip

LibS

VM

Figure 8 Comparison performance of MAS-IDS with other meth-ods

the best ratios In state of the art methods IDS accuracy isusually measured due to the equivalence between the errorand correct rates Thus when comparing various methodswe adopt accuracy as the best measure On this basis theperformance of our MAS-IDS is superior to other methodsas shown in Table 3 More specifically the average MAS-IDS accuracy computed by using all testing datasets andall training datasets is 09113 which is greater than thoseachieved by other methods Figure 8 shows the performanceof all methods using data given in Table 3

43 MS-IDS Processing Time The last experiment demon-strates the strength of the MAS-IDS in improving the dataclassification processing time by using a multiagent systemIn this experiment five of the previously specified computerswere used In addition we used the forth training dataset(trainDS4) fromTable 1with four new large testing datasets toevaluate the strength ofMAS-IDS in processing large datasetsin less time Table 4 shows the characteristics of the newtesting datasets

To show the ability of the MAS-IDS to reduce theprocessing time the MAS-IDS approach is compared with

Table 4 Characteristics of new testing datasets used to evaluate theMAS-IDS processing time

Dataset Normal DoS Probe R2L U2R TotalnewTestDS1 35000 35000 10000 10000 10000 100000newTestDS2 70000 70000 20000 25000 15000 200000newTestDS3 100000 100000 30000 50000 20000 300000newTestDS4 150000 150000 30000 50000 20000 400000

nonagents hybrid modified 119870-means and C45 Here MAS-IDS is implemented every time a new computer is added Inother words MAS-IDS initially runs on one computer andwhen a second computer is added it starts running on bothand so on until all five computers are used Table 5 shows theprocessing time of this experimentThemaximumnumber ofagents that can be implemented with each computer is eightbecause each computer has eight core CPUs and each corecan run in parallel only one agent at the time The trainingtime of this experiment is 8814 s It should be noted thatthe coordinator agent is running on the first computer of thesystem environment

The results presented in Table 5 are based on the numberof computers and the number of agents The results in theupper left corner of Table 5 pertain to the case of using onecomputer with one agent Thus this is the worst case andshould be compared with nonagents hybrid modified 119870-means with C45 On the other hand the results in the lowerright corner of Table 4 represent the best case of MAS-IDS(maximum number of computers and agents) Furthermoreas can be seen from the data when using two computers dueto the cost of data transfer through the network which willincrease the processing time no improvements are achievedby MAS-IDS relative to other approaches However thisproblem is mitigated by introduction of additional comput-ers Nonetheless theMAS-IDS processing timewhen appliedto a large dataset such as newTestDS4 is inadequate becausethe dataset subsets are still large and require long time tobe transferred to other computers through the network Thisproblem is eliminated when a large number of computersare employed due to dividing the dataset into smaller datasubsets Finally theMAS-IDS processing time decreases with

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

2 The Scientific World Journal

Most researchers studying these issues focused on theaccuracy of detection attacks by using differentmethods suchas neural networks [14 15] fuzzy logic [16 17] and machinelearning [18ndash20] thus dedicating less attention to the pro-cessing time required to detect attacks However processingspeed is becoming more important due to the increasingnetwork traffic and the need to achieve the objective ofintrusion detection systems as real time systems

One of the popular methods for reducing processingtime is distributed artificial intelligence (DAI) that mergesartificial intelligence with distributed computing [21] Mul-tiagent system (MAS) is one of the DAI branches Severalauthors have proposed agents aimed at improving the IDSperformance For example JAM [22] used metalearningdistributed datamining to build classifiers at difference nodesfor data analysis This system has a manager responsible forcoordinating simultaneous execution of classifiers by agentsIt subsequently combines the results of all classifiers byusing one of the Meta learning techniques On the otherhand DIDMA [23] uses two types of agents (1) static agentresponsible for collecting the information about attacks fromits host and (2) mobile agent responsible for gathering fromstatic agents the information pertaining to new attacks on thesystem However in these works agents are applied in IDS asa conceptual idea without demonstrating their performance

Over the years various machine learning techniqueshave been proposed with their authors claiming that theirperformance is best suited for IDS [10] 119870-means and deci-sion trees are among these techniques that are used widelyin designing IDS 119870-means is used to cluster the data tofind the meaningful structures or patterns in a collectionof unlabeled data so that the instances in the same clusterare similar while the instances from different clusters aredifferent from each other Several extant studies [24ndash29]presented 119870-means as a single algorithm for clustering theIDS data into a set of clusters representing normal processesand attacks Furthermore different authors [30ndash34] havepresented combined methods depending on 119870-means andother techniques which were used to build the model of IDSOn the other hand C45 is used to build a tree structure ofattack signatures as well as constructing the tree structureof normal behaviors This approach depends on maximuminformation gain in the feature selection criterion and mini-mal information split into building the tree structure In someapproaches C45 is used as a singlemodel to constructmisusedetection only [35 36] while other researchers combinedC45 with other techniques to build IDS as a layered model[37] Finally in some studies [38ndash41] C45 was evaluated andcompared with other techniques in order to demonstrate itsperformance However combining 119870-means with decisiontrees has resulted in good IDS accuracy For example in arecent study [30] C45 technique was used with 119870-means todesign a supervised anomaly detection system while otherresearch group [42] used ID3 combined with 119870-means in anovel supervised anomaly detection approach

This study proposes a hybrid modified 119870-means withC45 algorithms to build IDS in a multiagent system environ-ment The aim of this approach is to improve the anomaly-based detection accuracy while the MAS is required to

reduce the IDS processing time In the proposed designMAS utilizes three types of agents coordinator analysis andcommunication agent The coordinator agent is responsiblefor building trees of the training dataset through the useof modified 119870-means to cluster the data into a number ofclusters It subsequently uses the C45 technique to buildthe tree for each cluster The resulting trees exhibit highefficiency in classifying the testing data because each tree isbuilt from similar instances as attributesThus it is possible todiscriminate between classes with high accuracy Moreoverthese trees will reduce the processing time as search isperformed on a smaller tree The second task of the coor-dinating agent is dividing the testing dataset into a numberof subsets Moreover this agent will send every subset oftesting data with the training trees to one of the analysisagents responsible for analyzing it Lastly the coordinatoragent combines the results yielded by the analysis agentsin order to obtain the final results On the other hand theanalysis agent is responsible for analyzing the data receivedfrom the coordinator agent It is using the closest decision treeto classify each instance of the dataset into the appropriateclass Finally the communication agent is responsible fortransferring the data and results between coordinator agentsand analysis agents The modification of 119870-means is themethod adopted for choosing the initial centroids of clustersThis work will significantly reduce the processing time thusincreasing the IDS efficiency KDD Cup 1999 dataset is usedto evaluate the performance of the proposed system andJADE platform is used for its implementation

The remainder of this paper is organized as followsSection 2 provides a brief review of the related work on 119870-means C45 andmultiagent systems with IDSThe proposedsystem is described in Section 3 while Section 4 presentsexperimental results in order to demonstrate the proposedsystem performance The concluding remarks are given inSection 5

2 Related Work

This section provides a detailed description of the role ofmultiagent system in IDS and discusses the role of 119870-meansand C45 algorithms in building the IDS models Extantstudies have confirmed that C45 technique can achieve betterclassification performance In addition 119870-means has highability to group the data into clusters where the instances insame cluster have high similarity

21 Multiagent Systems (MAS) Various multiagent systems(MASs) for IDS have been proposed [43ndash47] Dasgupta etal [43] for example developed the Cougaar framework andpresented hierarchical architecture consisting of four differ-ent agents (manager monitor decision and action agent)The authors used intelligent decision support modules suchas fuzzy inference system to detect anomalies at the packetprocess user and system level This work however failsto explicate how multiple security nodes of CIDS shouldbe organized when large numbers are needed to protectmany hosts in larger networks Moreover the authors do

The Scientific World Journal 3

Input Dataset 119896Output Clusters(1) Select 119896 initial centroids of clusters randomly(2) Assign every instance 120596

119894isin Dataset to the closest centroid to make 119896 clusters 119862

1 1198622 119862

119896

(3) Calculate cluster centroids 120596119894= (1119896

119894) sum119896119894

119895=1120596119894119895 119894 = 1 119896

(4) For every instance 120596119894isin Dataset Do

(41) Reassign 120596119894to closest cluster centroid 120596

119894isin 119862119904is moved from 119862

119904to 119862119905

If 120596119894minus 120596119905 le 120596

119894minus 120596119895 forall119895 = 1 119896 119895 = 119904

(42) Recalculate centroids for clusters 119862119904and 119862

119905

(5) If cluster instances are stabilizedThen stop Else go to Step (4)

Pseudocode 1 Pseudocode of the standard 119870-means algorithm

not discuss how and what information the manager agentsshould share The architecture of multiagent flow-based IDSwas developed in a different study [44] where the concept ofreputation system was used to permit agents to find nodesthat are most effective for classifying malicious networkactivity Zhu et al [45] presented MAIIDS using more thanone technique such as neural network association rules forlearning agents and generating rules for decision agents todetect the audit data according to these rules and respond tothemThe experimental results indicate that their system hasvery high self-adapting ability intelligence and expansibilityEl Ajjouri et al [46] presented architecture based on addinga learning feature whereby abnormal behaviors correspondto unknownmalicious patternsThis architecture first detectsnew attacks using the agent responsible for detecting the newbehavior after which it updates the basic attack patternsThis agent used case-based reasoning (CBR) technique asthe attack detection method Yang et al [47] presenteddistributed agent model dependent on artificial immunesystems (AIS) for building IDSThis system takes the featuresof AIS such as self-adapting self-learning self-organizingparallel processing and distributed coordinating Althoughthiswork includes a section on empirical findings the authorsdo not provide any details about the experiments they haveconducted Therefore it is not possible to ascertain thesignificance of their results As can be seen from abovenone of the authors of extant works on agent-based IDSdiscussed or presented the results of processing time clearlyThis shortcoming is addressed in the present study where oneof the measurements that evaluate the performance of IDSbased on MAS requires computing the IDS processing time

22 119870-Means Algorithm-Based IDS 119870-means algorithm isthe method that clusters groups of objects into 119896 disjointclusters based on their attributes [48]The objects in the sameclusters are similar while those from different clusters differfrom one another This algorithm uses one of the similaritymeasures to compute the distance between two objects Themeasure most commonly used by 119870-means is the EuclideandistanceThe advantage of119870-means is its flexibility in dealingwith large datasets [25] with the time complexity 119874(119905119896119899)where 119905 represents the number of iteration times 119896 denotesthe number of clusters and 119899 is the number of datasetinstances However the main disadvantage of the 119870-means

algorithm is the need to find the best number of clusters 119896In addition it is sensitive to the isolated dataset instances[25] and the algorithm converges finitely to local minimaConsequently the initial centroids of clusters significantlyaffect the 119870-means algorithm output The pseudocode ofstandard119870-means is shown in Pseudocode 1 [49] where

1003817100381710038171003817119909 minus1199101003817100381710038171003817 =

radic

no Attributessum

119894=1(119909119894minus 119910119894)2 (1)

In the past many ideas were intended to improve theperformance of 119870-means Most of these methods aimed atimproving the method used for selecting the initial centroidsof clusters For example Ball and Hall [50] adopted thecentroid of the dataset as the first centroid that is 1198831015840 =

1119873 lowast sum119873

119895=1119909119895 before choosing the remaining centroids

in arbitrary fashion if the distance between them andpreviously selected centroids is greater than the thresholduntil 119896 centroids are obtained Maximin method developedby Katsavounidis et al [51] on the other hand chooses thefirst centroid arbitrarily while the subsequent centroids (119896minus1)are chosen as instances that have the greatest minimum-distance with respect to the previously selected centroidsAl-Daoudrsquos variance-based method [52] sorts the instancesof data depending on the variance in attributes and thenpartitions them into 119896 groups with the same dimension(the medians of these groups are chosen as centroids)The 119896-means++ method [53] combines 2th MacQueen withMaximin method to select the first centroid randomly andthe 119894th (119894 isin 2 3 119896) centroid is chosen as an instancewith probability md(instance1015840)2sum119873

119895=1md(instance

119895)2 where

md(119909) denotes the minimum-distance from the previouslyselected centroids Erisoglu et al [54] proposed amethod thatfirst chooses two main vectors representing the best datasetdistribution before computing the centroid of the datasetas a mean of these two vectors The first cluster centroid isthe instance with the longest Euclidean distance from thecentroid of the dataset while the 119894th cluster centroid is theinstance with the maximum combined distance from theprevious (119894 minus 1) cluster centroids

23 Hybrid 119870-Means-Based IDS The best possible highdetection rate and low false alarm rate can be achieved by

4 The Scientific World Journal

using hybrid approaches for IDSHybrid119870-means combinedwith other techniques played an important role in thisfield Xiao et al [31] proposed a 119870-means algorithm basedon PSO for network anomaly detection The authors usedPSO to solve the problem of local convergence minimumof 119870-means capitalizing on the PSOrsquos global search abilityExperimental results with a KDD dataset demonstrate thatthe proposedmethod is effective in dealingwith large datasetsand achieves a satisfactory detection rate Yongzhong et al[55] also proposed a similar PSO-119870-means hybrid systemMuda et al [32] proposed a hybrid learning approach througha combination of119870-means clustering and Naıve Bayes classi-ficationThe authors clustered all data into the correspondinggroup before applying a classifier for classification purposeTheir results show that the proposed approach achievedreasonable false alarm rate A clustering algorithm that usesSOM and 119870-means for intrusion detection was proposedby Wang et al [33] When the SOM finishes its trainingprocess119870-means is adopted to refine theweights obtained bytraining In addition once SOM completes cluster formation119870-means is applied to refine the final clustering results Chan-drasekhar and Raghuveer [34] proposed a new approachbased on fuzzy neural network and support vector machineto improve the IDS detection rate Here 119870-means clusteringwas first applied to generate different training data subsets

However Muniyandi et al [30] proposed an anomalydetection method using 119870-means combined with C45 forclassifying anomalous andnormal activities In this approach119870-means clustering is used initially to partition the trainingdataset into 119896 clusters using Euclidean distance Then thedecision tree is built for each cluster using the C45 techniqueand the rules created by the decision tree are used todetect intrusion events The testing phase is implementedthrough two steps In the first step the Euclidean distanceis computed for every testing instance before finding theclosest cluster Therefore the decision tree corresponding tothe closest cluster is selected to detect the class of the instanceIn this work 119870-means still have some shortcomings asthe clustering output mostly depends on the selection ofthe initial centroids of clusters In addition the numberof clusters 119896 needs to be given in advance Moreover theresulting clusters do not include all the possibilities of classinstances

3 Proposed Hybrid Modified 119870-Means withC45 in MAS-IDS

TheproposedMAS-IDS uses three agents that are responsiblefor achieving the IDS goals coordinator analysis and com-munication agent TheMAS-IDS system is shows in Figure 1The details of the proposed system are elaborated on in thenext sections

31 Multiagent System-Based IntrusionDetection System (MAS-IDS)

311 Coordinator Agent The deliberative coordinator agentuses a training dataset to train the hybrid system through

Host 1

Communication agent

Coordinatoragent

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Host 2

Communication agent

Coordinatoragent

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Host N

Communication agent

Coordinatoragent

Network traffic

Communication between agentsShare information between agentsCapture network traffic

middot middot middot

Figure 1 The MAS-IDS architecture

constructed clusters by using modified 119870-means It subse-quently applies the C45 technique on each cluster to buildthe decision trees that will be used in the testing phase Onthe other hand the coordinator agent receives and dividesthe gathering traffic data network into a number of subsetsby applying (2) Therefore it sends these subsets with thetrees and centroids of clusters to the analysis agents in theother hosts by using communication agents At the same timethe coordinator agent has information about all the hosts ofthe system environment where each host periodically sendsthe number of its cores that are presently not busy to thecoordinator agent The scenario in which the coordinatoragent operates can be summarized in the following steps

(1) Read the training dataset(2) Call modified119870-means (Pseudocode 2) to cluster the

training dataset into a set of clusters(3) Build the tree for each cluster by using the C45

technique(4) Send the trees and centroids of clusters to static agents

in the other hosts(5) Capture traffic network data packets(6) Specify the number of core CPUs in hosts of the

system that are presently not busy (assume 119899)(7) Divide the captured data into 119899 subsets refer to (2)(8) Create 119899 analysis agents in the other hosts refer to (3)(9) Send the 119899 data subsets to the analysis agents by using

the communication agent(10) Wait until all analysis agents finish analyzing data

The Scientific World Journal 5

Input DatasetOutput Clusters Centroids(1) Set 119896 = 1 119888

1= First instance 120596

1

(2) For every instance 120596119894isin Dataset and 119894 = 1 Do

(21) If 120596119894minus 119888119904 gt 119905ℎ119903119890119904ℎ119900119897119889 119904 = 1 119896Then

(22) 119896 = 119896 + 1 119888119896= 120596119894

(3) Assign every instance 120596119894isin Dataset to the closest centroid to make 119896 clusters 119862

1 1198622 119862

119896

(4) Calculate cluster centroids 120596119894= (1119896

119894) sum119896119894

119895=1120596119894119895 119894 = 1 119896

(5) For every instance 120596119894isin Dataset Do

(51) Reassign 120596119894to closest cluster centroid 120596

119894isin 119862119904is moved from 119862

119904to 119862119905

If 120596119894minus 120596119905 le 120596

119894minus 120596119895 forall119895 = 1 119896 119895 = 119904

(52) Recalculate centroids for clusters 119862119904and 119862

119905

(6) If cluster instances are stabilized then stop Else go to Step (4)

Pseudocode 2 Pseudocode of the modified 119870-means algorithm

(11) Combine the results yielded by the analysis agentsusing (4)

Data set = 1198781 1198782 119878119895 119878119899 (2)

subject to

119878119895= set of Instances isin Data set each Instance notin

119878119894

forall119894 = 1 119899 and 119894 = 119895|119878119895| = |Data set|119899 119895 = 1 119899

where 119899 le core CPUs available in the system

312 Analysis Agent A set of reactive analysis agents iscreated in the other hosts within the system environment byusing (3) where the number of analysis agents is equal to thenumber of subsets resulting from the splitting process Eachanalysis agent receives one subset of testing data along withthe centroids of clusters and decision trees that have beencreated in the training phase by the coordinator agent Infact the coordinator agent sends amessage about the numberof agents needed to the deliberative agent resident in eachhost Thus the resident agent creates these analysis agentsUnfortunately in JADE if the coordinator agent is creatingthe analysis agents directly in the other hosts the analysisagents are logically created in these hosts However physi-cally these agents are created in the host of the coordinatoragent Consequently the analysis agents will be using samethe core CPU and memory as the coordinator agent hostEach analysis agent is running as a thread by using one of thecore CPUs on that host [56] Hence if this host has four coresit can simultaneously run four threads of agents in parallel

forall119878119895 create AA

119895isin Host

119894997888rarr analysis (119878

119895AA119895)

119895 = 1 119899 1 le 119894 le 119898

(3)

subject to

number of AA in Host119894le number of core CPUs in

Host119894

where AA119895represents the analysis agent 119895 analysis (119878

119895AA119895)

represents the analysis function used to analyze subset 119878119895by

analysis Agent AA119895 and 119898 represents the number of hosts

available in the system environmentIn the analysis agent each instance is first tested with

the closest centroid of clusters after which the decision treecorresponding to this centroid is used to determine the typeof instance If the instance attributes do not match any classfrom the decision tree this instance is treated as attack andthe decision trees are updated with the data pertaining to thisattack to assist with future detection The scenario in whichthe analysis agent operates is as follows

(1) Receive the subset data centroids of clusters and treesfrom the communication agent

(2) Call the pseudocode (Pseudocodes 3 and 4) to analyzethe data

(3) Return the results to the coordinator agent by usingthe communication agent

Finally the coordinator agent combines all the resultsproduced by the analysis agents by using (4) to provide thefinal results to the system administrator At this time thesystem administrator will raise an alert to deal with thissituation

Normal instances =119899

119894=1

Normal (AA119894)

Attack instances =119899

119894=1

Attack (AA119894)

(4)

313 Communication Agent The communication agent isresponsible for transferring data and results between agentsThe scenario in which the communication agent operates ispresented through the following steps

(1) Receive datasets centroids of clusters and decisiontrees from the coordinator agent

(2) Move the above from the coordinator agent host tothe analysis agent host

6 The Scientific World Journal

Input Testing Dataset centroid treesOutput Predication of instances testing dataset P(1) For every instance 120596

119894isin testing dataset Do

(11) Choose centroid 119888119895 120596119894minus 119888119895 lt 120596

119894minus 119888119904 for all 119878 = 1 119896 and 119878 = 119895

(12) 119875 = add Call pseudo code MatchTree (120596119894 119903119900119900119905119895)

Pseudocode 3 Pseudocode of the determination the closest centroid for an instance

Input instance 120596 rootOutput Class(1) If rootbranch = 0Then return rootvalue(2) Choose attribute 119886

119894isin 120596 corresponding index of root

(3) For all branchValues isin root Do(31) If 119887119903119886119899119888ℎ119881119886119897119906119890

119895= 119886119894exists Then Call pseudo code MatchTree (120596 119903119900119900119905119887119903119886119899119888ℎ

119895)

(32) return ldquounknownrdquo

Pseudocode 4 Pseudocode of MatchTree

(3) Give the dataset centroids of clusters and decisiontrees to the analysis agent

(4) Receive the results from the analysis agent

(5) Move the results from the analysis agent host to thecoordinator agent host

(6) Provide the results to the coordinator agent

32 Modified 119870-Means Algorithm The main advantage ofmodified 119870-means that distinguishes it from other adjusted119870-means in the extant literature is its ability to consider allpossible eventualities by treating all the divergent points inthe dataset as initial centroids of clusters rather than selectinga specific set of initial centroids randomly as is typicallydone In other words modified 119870-means constructs clusterswith all the cases characterized by significant differencesamong instancesThus modified119870-means will distribute thedataset instances to convenient clusters with best accuracyHowever unlike other adjusted119870-means in the modified119870-means approach determining the number of clusters 119896 is notrequired as this is done dynamically The main differencebetween the modified and the standard 119870-means is in theselection of initial centroids of clusters as shown in thefollowing steps

(1) Select the first centroid of the cluster as the firstinstance of the dataset

(2) Select the instance with the distance from all the pre-viously selected centroids greater than the specifiedthreshold (best threshold = 4000 first experiment) asthe next centroid

(3) Repeat Step (2) to reach to the end of the dataset

(4) Apply the other steps of standard 119870-means on theselected initial centroid of clusters

Pseudocode 2 shows the pseudocode of modified 119870-means Our modification of 119870-means is evident in Steps (1)and (2) of the pseudocode

33 C45 Algorithm After distributing the training datasetinstances among the clusters by usingmodified119870-means thestandard C45 technique developed by Quinlan [57] is usedto build the trees from clusters whereby C45 builds tree foreach cluster More details and the pseudocode of C45 canbeen found elsewhere [58 59]

34 Testing Phase This phase is implemented by the analysisagent and is executed in two stages to test the traffic datanetwork In the first stage the closest centroid of testinginstance is chosen (the pseudocode of this stage is shown inPseudocodes 3) In the second stage the subtree correspond-ing to the centroid chosen in the first stage is implemented inorder to test the instance and identify the appropriate class forthis instanceThe pseudocode of the second stage is shown inPseudocodes 4

4 Experimental Setup and Analysis

We used the benchmark KDD Cup 1999 [60] to evaluatethe MAS-IDS performance In most of the previous worksin this field the authors used cross-validation such as 10-fold for evaluation Cross-validation was based on using thesame classes of training data without adding new classesin the testing stage Thus these works could achieve highperformance in terms of accuracy and detection rate Onthe other hand the strength of IDS stems from its ability todetect unknown attacks (new attacks) The KDD Cup 1999dataset consists of two datasets 10 KDDCUP dataset (usedfor training) and Corrected dataset (employed in testing)More details about KDD Cup 1999 can be found in extantliterature [61] Among the available performance measures

The Scientific World Journal 7

accuracy (Acc) detection rate (DR) and false alarm rate(FAR) are most popular when aiming to evaluate the MAS-IDS performance

Acc = TP + TNTP + TN + FP + FN

DR =TP

TP + FN

FAR =FP

TN + FP

(5)

The computers used to implement the experiments areequipped with Core-i7 340GHz with 8 core CPUs and6GB RAM The OS is Windows 7 professional 64 bitsThe experiment was conducted in JADE platform and wasimplemented using JAVA programming

Table 1 shows the details of datasets used to evaluate theMAS-IDS performance along with the conventional method(hybrid standard 119870-means with C45) and other techniquesIt should be noted that training datasets (trainDS1 trainDS2trainDS3 and trainDS4) were generated randomly from 10KDDCUP dataset while testing datasets (testDS1 testDS2testDS3 and testDS4) were generated randomly from Cor-rected dataset

The preprocessing for the symbolic attributes is achievedThe three symbolic attributes are protocol service and flagthat convert to numeric values such as protocol attributeThe three values tcp udp and icmp are converted to 1 2and 3 respectively and the same approach is adopted for theremaining attributes

In this study three experiments were carried out Inthe first experiment the best value of the threshold wascomputed while the MAS-IDS performance was evaluatedin the second experiment by comparing the results yieldedby MAS-IDS with those obtained through the conventionalmethod and other techniques available in Weka and MatlabIn the third experiment we compared the processing timerequired by MAS-IDS with that of hybrid modified119870-meanswith C45 in nonagent environment

41 Identifying the Best Threshold for Modified119870-Means Themodified119870-means requires a predetermined threshold valueto select the initial centroids of clusters In this experiment alltraining datasets in Table 1 are used with testDS1 to computethe average accuracy for different values (1000ndash10000) Thevalue that yields the highest accuracy is thus chosen as thethreshold for modified 119870-means As can be seen in Figure 2the threshold value is 4000 as it results in an average ofaccuracy of 090155 We used all the training datasets withonly one testing dataset to choose the threshold value becausethe modified 119870-means approach is applied only on thetraining dataset to construct the clusters In all subsequentexperiments the chosen threshold (4000) is employed withhybrid modified119870-means and C45

42 MAS-IDS Performance In order to compare MAS-IDSwith the hybrid standard 119870-means and C45 [30] the bestvalue of 119896 for 119870-means is identified Typically 119870-means is

Table 1 The details of evaluation datasets

Dataset Normal DoS Probe R2L U2R TotaltrainDS1 900 1000 300 500 300 3000trainDS2 1100 1300 300 800 500 4000trainDS3 1500 1800 400 1000 300 5000trainDS4 1800 1800 500 1100 800 6000testDS1 5000 3000 700 900 400 10000testDS2 10000 7000 1000 1500 500 20000testDS3 15000 10000 1500 2500 1000 30000testDS4 20000 14000 1500 3000 1500 40000

088408860888

0890892089408960898

0909020904

0 2000 4000 6000 8000 10000

Accu

racy

Threshold

Figure 2 Computing the best threshold value for modified 119870-means

0010203040506070809

1

0 10 20 30 40 50 60 70 80 90 100 110

Perfo

rman

ce

k

AccuracyDRFAR

Figure 3 The performance of hybrid standard 119870-means and C45

run independently for different values of 119896 and the partitionthat appears the most meaningful to the domain expertis selected [62] Figure 3 shows the performance of hybridstandard 119870-means with C45 for different 119896 values (119896 =

10 20 30 100) The best 119896 value is equal to 10 because ityields the highest accuracy (9067) and detection rate (8480)As can be seen only the false alarm rate percentage (346) isnot the most optimal as 21 is achieved when 119896 = 100 Thusin all subsequent experiments we adopt 119896 = 10 as the bestnumber of clusters

8 The Scientific World Journal

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 4 ROC curve for testDS1

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 5 ROC curve for testDS2

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 6 ROC curve for testDS3

The ROC curves in Figures 4 5 6 and 7 show theproposed method performance in comparison with hybrid119870-means with C45 [30]

According to the ROC curves for the proposed hybridmodified119870-means with C45 in MAS-IDS it achieved better

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 7 ROC curve for testDS4

Table 2 The accuracy results of the MAS-IDS versus hybrid 119870-means and C45 [30]

Training dataset Testing dataset MAS-IDS Hybrid 119870-meansand C45

trainDS1

testDS1 09031 08993testDS2 091745 0916021testDS3 09126 0909633testDS4 09147 0911625

trainDS2

testDS1 08874 08742testDS2 09107 09008testDS3 0904367 0894967testDS4 090545 089475

trainDS3

testDS1 09058 08932testDS2 092225 09135testDS3 091667 09049testDS4 0916125 090805

trainDS4

testDS1 09099 08963testDS2 09215 090815testDS3 09155 09016testDS4 0918025 090625

119901 value 000000028

results in comparison with the conventional method The 119905-test shows that theMAS-IDS significantly improved accuracywith 119901 value lt 005 (000000028) Therefore the MAS-IDSwas tested by computing the classification results pertainingto each training dataset using all testing datasets presentedin Table 1 Table 2 shows the comparison accuracy betweenMAS-IDS and hybrid 119870-means with C45 Table 3 showsthe average results of the MAS-IDS evaluation along withthe comparison with the conventional method and othermethods fromWeka and Matlab

As can be seen from Table 3 the MAS-IDS approachachieves higher accuracy and detection rate as well as119865-measure However the false alarm rate precision andspecificity are not superior to those achieved by the othermethods especially the Decision Table which produces

The Scientific World Journal 9

Table 3 Comparison of the MAS-IDS performance with other methods using different measures

Method Accuracy DR FAR Precision Specificity 119865-measureMAS-IDS 09113 08526 00299 09665 09701 09056Hybrid 119870-means and C45 (2012) 09021 08394 00353 09600 09647 08954Bayes Net 09017 08177 00142 09829 09858 08926Naıve Bayes 08150 07727 01427 08578 08573 08076SMO 08805 07785 00174 09781 09826 08664IBk 08886 07962 00190 09766 09810 08771J48 08513 07298 00273 09638 09727 08299NBTree 09007 08096 00081 09900 09919 08903Decision Table 08306 06631 00020 09970 09980 07956JRip 08377 06983 00229 09682 09771 08111LibSVM 07964 08120 02191 08169 07809 08068

0010203040506070809

1

Perfo

rman

ce

AccuracyDRFAR

MA

S-ID

S

Hyb

rid K

-mea

ns

and

C45

Baye

s Net

Naiuml

ve B

ayes

SMO

IBk

J48

NBT

ree

Dec

ision

Tab

le

JRip

LibS

VM

Figure 8 Comparison performance of MAS-IDS with other meth-ods

the best ratios In state of the art methods IDS accuracy isusually measured due to the equivalence between the errorand correct rates Thus when comparing various methodswe adopt accuracy as the best measure On this basis theperformance of our MAS-IDS is superior to other methodsas shown in Table 3 More specifically the average MAS-IDS accuracy computed by using all testing datasets andall training datasets is 09113 which is greater than thoseachieved by other methods Figure 8 shows the performanceof all methods using data given in Table 3

43 MS-IDS Processing Time The last experiment demon-strates the strength of the MAS-IDS in improving the dataclassification processing time by using a multiagent systemIn this experiment five of the previously specified computerswere used In addition we used the forth training dataset(trainDS4) fromTable 1with four new large testing datasets toevaluate the strength ofMAS-IDS in processing large datasetsin less time Table 4 shows the characteristics of the newtesting datasets

To show the ability of the MAS-IDS to reduce theprocessing time the MAS-IDS approach is compared with

Table 4 Characteristics of new testing datasets used to evaluate theMAS-IDS processing time

Dataset Normal DoS Probe R2L U2R TotalnewTestDS1 35000 35000 10000 10000 10000 100000newTestDS2 70000 70000 20000 25000 15000 200000newTestDS3 100000 100000 30000 50000 20000 300000newTestDS4 150000 150000 30000 50000 20000 400000

nonagents hybrid modified 119870-means and C45 Here MAS-IDS is implemented every time a new computer is added Inother words MAS-IDS initially runs on one computer andwhen a second computer is added it starts running on bothand so on until all five computers are used Table 5 shows theprocessing time of this experimentThemaximumnumber ofagents that can be implemented with each computer is eightbecause each computer has eight core CPUs and each corecan run in parallel only one agent at the time The trainingtime of this experiment is 8814 s It should be noted thatthe coordinator agent is running on the first computer of thesystem environment

The results presented in Table 5 are based on the numberof computers and the number of agents The results in theupper left corner of Table 5 pertain to the case of using onecomputer with one agent Thus this is the worst case andshould be compared with nonagents hybrid modified 119870-means with C45 On the other hand the results in the lowerright corner of Table 4 represent the best case of MAS-IDS(maximum number of computers and agents) Furthermoreas can be seen from the data when using two computers dueto the cost of data transfer through the network which willincrease the processing time no improvements are achievedby MAS-IDS relative to other approaches However thisproblem is mitigated by introduction of additional comput-ers Nonetheless theMAS-IDS processing timewhen appliedto a large dataset such as newTestDS4 is inadequate becausethe dataset subsets are still large and require long time tobe transferred to other computers through the network Thisproblem is eliminated when a large number of computersare employed due to dividing the dataset into smaller datasubsets Finally theMAS-IDS processing time decreases with

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

The Scientific World Journal 3

Input Dataset 119896Output Clusters(1) Select 119896 initial centroids of clusters randomly(2) Assign every instance 120596

119894isin Dataset to the closest centroid to make 119896 clusters 119862

1 1198622 119862

119896

(3) Calculate cluster centroids 120596119894= (1119896

119894) sum119896119894

119895=1120596119894119895 119894 = 1 119896

(4) For every instance 120596119894isin Dataset Do

(41) Reassign 120596119894to closest cluster centroid 120596

119894isin 119862119904is moved from 119862

119904to 119862119905

If 120596119894minus 120596119905 le 120596

119894minus 120596119895 forall119895 = 1 119896 119895 = 119904

(42) Recalculate centroids for clusters 119862119904and 119862

119905

(5) If cluster instances are stabilizedThen stop Else go to Step (4)

Pseudocode 1 Pseudocode of the standard 119870-means algorithm

not discuss how and what information the manager agentsshould share The architecture of multiagent flow-based IDSwas developed in a different study [44] where the concept ofreputation system was used to permit agents to find nodesthat are most effective for classifying malicious networkactivity Zhu et al [45] presented MAIIDS using more thanone technique such as neural network association rules forlearning agents and generating rules for decision agents todetect the audit data according to these rules and respond tothemThe experimental results indicate that their system hasvery high self-adapting ability intelligence and expansibilityEl Ajjouri et al [46] presented architecture based on addinga learning feature whereby abnormal behaviors correspondto unknownmalicious patternsThis architecture first detectsnew attacks using the agent responsible for detecting the newbehavior after which it updates the basic attack patternsThis agent used case-based reasoning (CBR) technique asthe attack detection method Yang et al [47] presenteddistributed agent model dependent on artificial immunesystems (AIS) for building IDSThis system takes the featuresof AIS such as self-adapting self-learning self-organizingparallel processing and distributed coordinating Althoughthiswork includes a section on empirical findings the authorsdo not provide any details about the experiments they haveconducted Therefore it is not possible to ascertain thesignificance of their results As can be seen from abovenone of the authors of extant works on agent-based IDSdiscussed or presented the results of processing time clearlyThis shortcoming is addressed in the present study where oneof the measurements that evaluate the performance of IDSbased on MAS requires computing the IDS processing time

22 119870-Means Algorithm-Based IDS 119870-means algorithm isthe method that clusters groups of objects into 119896 disjointclusters based on their attributes [48]The objects in the sameclusters are similar while those from different clusters differfrom one another This algorithm uses one of the similaritymeasures to compute the distance between two objects Themeasure most commonly used by 119870-means is the EuclideandistanceThe advantage of119870-means is its flexibility in dealingwith large datasets [25] with the time complexity 119874(119905119896119899)where 119905 represents the number of iteration times 119896 denotesthe number of clusters and 119899 is the number of datasetinstances However the main disadvantage of the 119870-means

algorithm is the need to find the best number of clusters 119896In addition it is sensitive to the isolated dataset instances[25] and the algorithm converges finitely to local minimaConsequently the initial centroids of clusters significantlyaffect the 119870-means algorithm output The pseudocode ofstandard119870-means is shown in Pseudocode 1 [49] where

1003817100381710038171003817119909 minus1199101003817100381710038171003817 =

radic

no Attributessum

119894=1(119909119894minus 119910119894)2 (1)

In the past many ideas were intended to improve theperformance of 119870-means Most of these methods aimed atimproving the method used for selecting the initial centroidsof clusters For example Ball and Hall [50] adopted thecentroid of the dataset as the first centroid that is 1198831015840 =

1119873 lowast sum119873

119895=1119909119895 before choosing the remaining centroids

in arbitrary fashion if the distance between them andpreviously selected centroids is greater than the thresholduntil 119896 centroids are obtained Maximin method developedby Katsavounidis et al [51] on the other hand chooses thefirst centroid arbitrarily while the subsequent centroids (119896minus1)are chosen as instances that have the greatest minimum-distance with respect to the previously selected centroidsAl-Daoudrsquos variance-based method [52] sorts the instancesof data depending on the variance in attributes and thenpartitions them into 119896 groups with the same dimension(the medians of these groups are chosen as centroids)The 119896-means++ method [53] combines 2th MacQueen withMaximin method to select the first centroid randomly andthe 119894th (119894 isin 2 3 119896) centroid is chosen as an instancewith probability md(instance1015840)2sum119873

119895=1md(instance

119895)2 where

md(119909) denotes the minimum-distance from the previouslyselected centroids Erisoglu et al [54] proposed amethod thatfirst chooses two main vectors representing the best datasetdistribution before computing the centroid of the datasetas a mean of these two vectors The first cluster centroid isthe instance with the longest Euclidean distance from thecentroid of the dataset while the 119894th cluster centroid is theinstance with the maximum combined distance from theprevious (119894 minus 1) cluster centroids

23 Hybrid 119870-Means-Based IDS The best possible highdetection rate and low false alarm rate can be achieved by

4 The Scientific World Journal

using hybrid approaches for IDSHybrid119870-means combinedwith other techniques played an important role in thisfield Xiao et al [31] proposed a 119870-means algorithm basedon PSO for network anomaly detection The authors usedPSO to solve the problem of local convergence minimumof 119870-means capitalizing on the PSOrsquos global search abilityExperimental results with a KDD dataset demonstrate thatthe proposedmethod is effective in dealingwith large datasetsand achieves a satisfactory detection rate Yongzhong et al[55] also proposed a similar PSO-119870-means hybrid systemMuda et al [32] proposed a hybrid learning approach througha combination of119870-means clustering and Naıve Bayes classi-ficationThe authors clustered all data into the correspondinggroup before applying a classifier for classification purposeTheir results show that the proposed approach achievedreasonable false alarm rate A clustering algorithm that usesSOM and 119870-means for intrusion detection was proposedby Wang et al [33] When the SOM finishes its trainingprocess119870-means is adopted to refine theweights obtained bytraining In addition once SOM completes cluster formation119870-means is applied to refine the final clustering results Chan-drasekhar and Raghuveer [34] proposed a new approachbased on fuzzy neural network and support vector machineto improve the IDS detection rate Here 119870-means clusteringwas first applied to generate different training data subsets

However Muniyandi et al [30] proposed an anomalydetection method using 119870-means combined with C45 forclassifying anomalous andnormal activities In this approach119870-means clustering is used initially to partition the trainingdataset into 119896 clusters using Euclidean distance Then thedecision tree is built for each cluster using the C45 techniqueand the rules created by the decision tree are used todetect intrusion events The testing phase is implementedthrough two steps In the first step the Euclidean distanceis computed for every testing instance before finding theclosest cluster Therefore the decision tree corresponding tothe closest cluster is selected to detect the class of the instanceIn this work 119870-means still have some shortcomings asthe clustering output mostly depends on the selection ofthe initial centroids of clusters In addition the numberof clusters 119896 needs to be given in advance Moreover theresulting clusters do not include all the possibilities of classinstances

3 Proposed Hybrid Modified 119870-Means withC45 in MAS-IDS

TheproposedMAS-IDS uses three agents that are responsiblefor achieving the IDS goals coordinator analysis and com-munication agent TheMAS-IDS system is shows in Figure 1The details of the proposed system are elaborated on in thenext sections

31 Multiagent System-Based IntrusionDetection System (MAS-IDS)

311 Coordinator Agent The deliberative coordinator agentuses a training dataset to train the hybrid system through

Host 1

Communication agent

Coordinatoragent

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Host 2

Communication agent

Coordinatoragent

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Host N

Communication agent

Coordinatoragent

Network traffic

Communication between agentsShare information between agentsCapture network traffic

middot middot middot

Figure 1 The MAS-IDS architecture

constructed clusters by using modified 119870-means It subse-quently applies the C45 technique on each cluster to buildthe decision trees that will be used in the testing phase Onthe other hand the coordinator agent receives and dividesthe gathering traffic data network into a number of subsetsby applying (2) Therefore it sends these subsets with thetrees and centroids of clusters to the analysis agents in theother hosts by using communication agents At the same timethe coordinator agent has information about all the hosts ofthe system environment where each host periodically sendsthe number of its cores that are presently not busy to thecoordinator agent The scenario in which the coordinatoragent operates can be summarized in the following steps

(1) Read the training dataset(2) Call modified119870-means (Pseudocode 2) to cluster the

training dataset into a set of clusters(3) Build the tree for each cluster by using the C45

technique(4) Send the trees and centroids of clusters to static agents

in the other hosts(5) Capture traffic network data packets(6) Specify the number of core CPUs in hosts of the

system that are presently not busy (assume 119899)(7) Divide the captured data into 119899 subsets refer to (2)(8) Create 119899 analysis agents in the other hosts refer to (3)(9) Send the 119899 data subsets to the analysis agents by using

the communication agent(10) Wait until all analysis agents finish analyzing data

The Scientific World Journal 5

Input DatasetOutput Clusters Centroids(1) Set 119896 = 1 119888

1= First instance 120596

1

(2) For every instance 120596119894isin Dataset and 119894 = 1 Do

(21) If 120596119894minus 119888119904 gt 119905ℎ119903119890119904ℎ119900119897119889 119904 = 1 119896Then

(22) 119896 = 119896 + 1 119888119896= 120596119894

(3) Assign every instance 120596119894isin Dataset to the closest centroid to make 119896 clusters 119862

1 1198622 119862

119896

(4) Calculate cluster centroids 120596119894= (1119896

119894) sum119896119894

119895=1120596119894119895 119894 = 1 119896

(5) For every instance 120596119894isin Dataset Do

(51) Reassign 120596119894to closest cluster centroid 120596

119894isin 119862119904is moved from 119862

119904to 119862119905

If 120596119894minus 120596119905 le 120596

119894minus 120596119895 forall119895 = 1 119896 119895 = 119904

(52) Recalculate centroids for clusters 119862119904and 119862

119905

(6) If cluster instances are stabilized then stop Else go to Step (4)

Pseudocode 2 Pseudocode of the modified 119870-means algorithm

(11) Combine the results yielded by the analysis agentsusing (4)

Data set = 1198781 1198782 119878119895 119878119899 (2)

subject to

119878119895= set of Instances isin Data set each Instance notin

119878119894

forall119894 = 1 119899 and 119894 = 119895|119878119895| = |Data set|119899 119895 = 1 119899

where 119899 le core CPUs available in the system

312 Analysis Agent A set of reactive analysis agents iscreated in the other hosts within the system environment byusing (3) where the number of analysis agents is equal to thenumber of subsets resulting from the splitting process Eachanalysis agent receives one subset of testing data along withthe centroids of clusters and decision trees that have beencreated in the training phase by the coordinator agent Infact the coordinator agent sends amessage about the numberof agents needed to the deliberative agent resident in eachhost Thus the resident agent creates these analysis agentsUnfortunately in JADE if the coordinator agent is creatingthe analysis agents directly in the other hosts the analysisagents are logically created in these hosts However physi-cally these agents are created in the host of the coordinatoragent Consequently the analysis agents will be using samethe core CPU and memory as the coordinator agent hostEach analysis agent is running as a thread by using one of thecore CPUs on that host [56] Hence if this host has four coresit can simultaneously run four threads of agents in parallel

forall119878119895 create AA

119895isin Host

119894997888rarr analysis (119878

119895AA119895)

119895 = 1 119899 1 le 119894 le 119898

(3)

subject to

number of AA in Host119894le number of core CPUs in

Host119894

where AA119895represents the analysis agent 119895 analysis (119878

119895AA119895)

represents the analysis function used to analyze subset 119878119895by

analysis Agent AA119895 and 119898 represents the number of hosts

available in the system environmentIn the analysis agent each instance is first tested with

the closest centroid of clusters after which the decision treecorresponding to this centroid is used to determine the typeof instance If the instance attributes do not match any classfrom the decision tree this instance is treated as attack andthe decision trees are updated with the data pertaining to thisattack to assist with future detection The scenario in whichthe analysis agent operates is as follows

(1) Receive the subset data centroids of clusters and treesfrom the communication agent

(2) Call the pseudocode (Pseudocodes 3 and 4) to analyzethe data

(3) Return the results to the coordinator agent by usingthe communication agent

Finally the coordinator agent combines all the resultsproduced by the analysis agents by using (4) to provide thefinal results to the system administrator At this time thesystem administrator will raise an alert to deal with thissituation

Normal instances =119899

119894=1

Normal (AA119894)

Attack instances =119899

119894=1

Attack (AA119894)

(4)

313 Communication Agent The communication agent isresponsible for transferring data and results between agentsThe scenario in which the communication agent operates ispresented through the following steps

(1) Receive datasets centroids of clusters and decisiontrees from the coordinator agent

(2) Move the above from the coordinator agent host tothe analysis agent host

6 The Scientific World Journal

Input Testing Dataset centroid treesOutput Predication of instances testing dataset P(1) For every instance 120596

119894isin testing dataset Do

(11) Choose centroid 119888119895 120596119894minus 119888119895 lt 120596

119894minus 119888119904 for all 119878 = 1 119896 and 119878 = 119895

(12) 119875 = add Call pseudo code MatchTree (120596119894 119903119900119900119905119895)

Pseudocode 3 Pseudocode of the determination the closest centroid for an instance

Input instance 120596 rootOutput Class(1) If rootbranch = 0Then return rootvalue(2) Choose attribute 119886

119894isin 120596 corresponding index of root

(3) For all branchValues isin root Do(31) If 119887119903119886119899119888ℎ119881119886119897119906119890

119895= 119886119894exists Then Call pseudo code MatchTree (120596 119903119900119900119905119887119903119886119899119888ℎ

119895)

(32) return ldquounknownrdquo

Pseudocode 4 Pseudocode of MatchTree

(3) Give the dataset centroids of clusters and decisiontrees to the analysis agent

(4) Receive the results from the analysis agent

(5) Move the results from the analysis agent host to thecoordinator agent host

(6) Provide the results to the coordinator agent

32 Modified 119870-Means Algorithm The main advantage ofmodified 119870-means that distinguishes it from other adjusted119870-means in the extant literature is its ability to consider allpossible eventualities by treating all the divergent points inthe dataset as initial centroids of clusters rather than selectinga specific set of initial centroids randomly as is typicallydone In other words modified 119870-means constructs clusterswith all the cases characterized by significant differencesamong instancesThus modified119870-means will distribute thedataset instances to convenient clusters with best accuracyHowever unlike other adjusted119870-means in the modified119870-means approach determining the number of clusters 119896 is notrequired as this is done dynamically The main differencebetween the modified and the standard 119870-means is in theselection of initial centroids of clusters as shown in thefollowing steps

(1) Select the first centroid of the cluster as the firstinstance of the dataset

(2) Select the instance with the distance from all the pre-viously selected centroids greater than the specifiedthreshold (best threshold = 4000 first experiment) asthe next centroid

(3) Repeat Step (2) to reach to the end of the dataset

(4) Apply the other steps of standard 119870-means on theselected initial centroid of clusters

Pseudocode 2 shows the pseudocode of modified 119870-means Our modification of 119870-means is evident in Steps (1)and (2) of the pseudocode

33 C45 Algorithm After distributing the training datasetinstances among the clusters by usingmodified119870-means thestandard C45 technique developed by Quinlan [57] is usedto build the trees from clusters whereby C45 builds tree foreach cluster More details and the pseudocode of C45 canbeen found elsewhere [58 59]

34 Testing Phase This phase is implemented by the analysisagent and is executed in two stages to test the traffic datanetwork In the first stage the closest centroid of testinginstance is chosen (the pseudocode of this stage is shown inPseudocodes 3) In the second stage the subtree correspond-ing to the centroid chosen in the first stage is implemented inorder to test the instance and identify the appropriate class forthis instanceThe pseudocode of the second stage is shown inPseudocodes 4

4 Experimental Setup and Analysis

We used the benchmark KDD Cup 1999 [60] to evaluatethe MAS-IDS performance In most of the previous worksin this field the authors used cross-validation such as 10-fold for evaluation Cross-validation was based on using thesame classes of training data without adding new classesin the testing stage Thus these works could achieve highperformance in terms of accuracy and detection rate Onthe other hand the strength of IDS stems from its ability todetect unknown attacks (new attacks) The KDD Cup 1999dataset consists of two datasets 10 KDDCUP dataset (usedfor training) and Corrected dataset (employed in testing)More details about KDD Cup 1999 can be found in extantliterature [61] Among the available performance measures

The Scientific World Journal 7

accuracy (Acc) detection rate (DR) and false alarm rate(FAR) are most popular when aiming to evaluate the MAS-IDS performance

Acc = TP + TNTP + TN + FP + FN

DR =TP

TP + FN

FAR =FP

TN + FP

(5)

The computers used to implement the experiments areequipped with Core-i7 340GHz with 8 core CPUs and6GB RAM The OS is Windows 7 professional 64 bitsThe experiment was conducted in JADE platform and wasimplemented using JAVA programming

Table 1 shows the details of datasets used to evaluate theMAS-IDS performance along with the conventional method(hybrid standard 119870-means with C45) and other techniquesIt should be noted that training datasets (trainDS1 trainDS2trainDS3 and trainDS4) were generated randomly from 10KDDCUP dataset while testing datasets (testDS1 testDS2testDS3 and testDS4) were generated randomly from Cor-rected dataset

The preprocessing for the symbolic attributes is achievedThe three symbolic attributes are protocol service and flagthat convert to numeric values such as protocol attributeThe three values tcp udp and icmp are converted to 1 2and 3 respectively and the same approach is adopted for theremaining attributes

In this study three experiments were carried out Inthe first experiment the best value of the threshold wascomputed while the MAS-IDS performance was evaluatedin the second experiment by comparing the results yieldedby MAS-IDS with those obtained through the conventionalmethod and other techniques available in Weka and MatlabIn the third experiment we compared the processing timerequired by MAS-IDS with that of hybrid modified119870-meanswith C45 in nonagent environment

41 Identifying the Best Threshold for Modified119870-Means Themodified119870-means requires a predetermined threshold valueto select the initial centroids of clusters In this experiment alltraining datasets in Table 1 are used with testDS1 to computethe average accuracy for different values (1000ndash10000) Thevalue that yields the highest accuracy is thus chosen as thethreshold for modified 119870-means As can be seen in Figure 2the threshold value is 4000 as it results in an average ofaccuracy of 090155 We used all the training datasets withonly one testing dataset to choose the threshold value becausethe modified 119870-means approach is applied only on thetraining dataset to construct the clusters In all subsequentexperiments the chosen threshold (4000) is employed withhybrid modified119870-means and C45

42 MAS-IDS Performance In order to compare MAS-IDSwith the hybrid standard 119870-means and C45 [30] the bestvalue of 119896 for 119870-means is identified Typically 119870-means is

Table 1 The details of evaluation datasets

Dataset Normal DoS Probe R2L U2R TotaltrainDS1 900 1000 300 500 300 3000trainDS2 1100 1300 300 800 500 4000trainDS3 1500 1800 400 1000 300 5000trainDS4 1800 1800 500 1100 800 6000testDS1 5000 3000 700 900 400 10000testDS2 10000 7000 1000 1500 500 20000testDS3 15000 10000 1500 2500 1000 30000testDS4 20000 14000 1500 3000 1500 40000

088408860888

0890892089408960898

0909020904

0 2000 4000 6000 8000 10000

Accu

racy

Threshold

Figure 2 Computing the best threshold value for modified 119870-means

0010203040506070809

1

0 10 20 30 40 50 60 70 80 90 100 110

Perfo

rman

ce

k

AccuracyDRFAR

Figure 3 The performance of hybrid standard 119870-means and C45

run independently for different values of 119896 and the partitionthat appears the most meaningful to the domain expertis selected [62] Figure 3 shows the performance of hybridstandard 119870-means with C45 for different 119896 values (119896 =

10 20 30 100) The best 119896 value is equal to 10 because ityields the highest accuracy (9067) and detection rate (8480)As can be seen only the false alarm rate percentage (346) isnot the most optimal as 21 is achieved when 119896 = 100 Thusin all subsequent experiments we adopt 119896 = 10 as the bestnumber of clusters

8 The Scientific World Journal

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 4 ROC curve for testDS1

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 5 ROC curve for testDS2

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 6 ROC curve for testDS3

The ROC curves in Figures 4 5 6 and 7 show theproposed method performance in comparison with hybrid119870-means with C45 [30]

According to the ROC curves for the proposed hybridmodified119870-means with C45 in MAS-IDS it achieved better

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 7 ROC curve for testDS4

Table 2 The accuracy results of the MAS-IDS versus hybrid 119870-means and C45 [30]

Training dataset Testing dataset MAS-IDS Hybrid 119870-meansand C45

trainDS1

testDS1 09031 08993testDS2 091745 0916021testDS3 09126 0909633testDS4 09147 0911625

trainDS2

testDS1 08874 08742testDS2 09107 09008testDS3 0904367 0894967testDS4 090545 089475

trainDS3

testDS1 09058 08932testDS2 092225 09135testDS3 091667 09049testDS4 0916125 090805

trainDS4

testDS1 09099 08963testDS2 09215 090815testDS3 09155 09016testDS4 0918025 090625

119901 value 000000028

results in comparison with the conventional method The 119905-test shows that theMAS-IDS significantly improved accuracywith 119901 value lt 005 (000000028) Therefore the MAS-IDSwas tested by computing the classification results pertainingto each training dataset using all testing datasets presentedin Table 1 Table 2 shows the comparison accuracy betweenMAS-IDS and hybrid 119870-means with C45 Table 3 showsthe average results of the MAS-IDS evaluation along withthe comparison with the conventional method and othermethods fromWeka and Matlab

As can be seen from Table 3 the MAS-IDS approachachieves higher accuracy and detection rate as well as119865-measure However the false alarm rate precision andspecificity are not superior to those achieved by the othermethods especially the Decision Table which produces

The Scientific World Journal 9

Table 3 Comparison of the MAS-IDS performance with other methods using different measures

Method Accuracy DR FAR Precision Specificity 119865-measureMAS-IDS 09113 08526 00299 09665 09701 09056Hybrid 119870-means and C45 (2012) 09021 08394 00353 09600 09647 08954Bayes Net 09017 08177 00142 09829 09858 08926Naıve Bayes 08150 07727 01427 08578 08573 08076SMO 08805 07785 00174 09781 09826 08664IBk 08886 07962 00190 09766 09810 08771J48 08513 07298 00273 09638 09727 08299NBTree 09007 08096 00081 09900 09919 08903Decision Table 08306 06631 00020 09970 09980 07956JRip 08377 06983 00229 09682 09771 08111LibSVM 07964 08120 02191 08169 07809 08068

0010203040506070809

1

Perfo

rman

ce

AccuracyDRFAR

MA

S-ID

S

Hyb

rid K

-mea

ns

and

C45

Baye

s Net

Naiuml

ve B

ayes

SMO

IBk

J48

NBT

ree

Dec

ision

Tab

le

JRip

LibS

VM

Figure 8 Comparison performance of MAS-IDS with other meth-ods

the best ratios In state of the art methods IDS accuracy isusually measured due to the equivalence between the errorand correct rates Thus when comparing various methodswe adopt accuracy as the best measure On this basis theperformance of our MAS-IDS is superior to other methodsas shown in Table 3 More specifically the average MAS-IDS accuracy computed by using all testing datasets andall training datasets is 09113 which is greater than thoseachieved by other methods Figure 8 shows the performanceof all methods using data given in Table 3

43 MS-IDS Processing Time The last experiment demon-strates the strength of the MAS-IDS in improving the dataclassification processing time by using a multiagent systemIn this experiment five of the previously specified computerswere used In addition we used the forth training dataset(trainDS4) fromTable 1with four new large testing datasets toevaluate the strength ofMAS-IDS in processing large datasetsin less time Table 4 shows the characteristics of the newtesting datasets

To show the ability of the MAS-IDS to reduce theprocessing time the MAS-IDS approach is compared with

Table 4 Characteristics of new testing datasets used to evaluate theMAS-IDS processing time

Dataset Normal DoS Probe R2L U2R TotalnewTestDS1 35000 35000 10000 10000 10000 100000newTestDS2 70000 70000 20000 25000 15000 200000newTestDS3 100000 100000 30000 50000 20000 300000newTestDS4 150000 150000 30000 50000 20000 400000

nonagents hybrid modified 119870-means and C45 Here MAS-IDS is implemented every time a new computer is added Inother words MAS-IDS initially runs on one computer andwhen a second computer is added it starts running on bothand so on until all five computers are used Table 5 shows theprocessing time of this experimentThemaximumnumber ofagents that can be implemented with each computer is eightbecause each computer has eight core CPUs and each corecan run in parallel only one agent at the time The trainingtime of this experiment is 8814 s It should be noted thatthe coordinator agent is running on the first computer of thesystem environment

The results presented in Table 5 are based on the numberof computers and the number of agents The results in theupper left corner of Table 5 pertain to the case of using onecomputer with one agent Thus this is the worst case andshould be compared with nonagents hybrid modified 119870-means with C45 On the other hand the results in the lowerright corner of Table 4 represent the best case of MAS-IDS(maximum number of computers and agents) Furthermoreas can be seen from the data when using two computers dueto the cost of data transfer through the network which willincrease the processing time no improvements are achievedby MAS-IDS relative to other approaches However thisproblem is mitigated by introduction of additional comput-ers Nonetheless theMAS-IDS processing timewhen appliedto a large dataset such as newTestDS4 is inadequate becausethe dataset subsets are still large and require long time tobe transferred to other computers through the network Thisproblem is eliminated when a large number of computersare employed due to dividing the dataset into smaller datasubsets Finally theMAS-IDS processing time decreases with

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

4 The Scientific World Journal

using hybrid approaches for IDSHybrid119870-means combinedwith other techniques played an important role in thisfield Xiao et al [31] proposed a 119870-means algorithm basedon PSO for network anomaly detection The authors usedPSO to solve the problem of local convergence minimumof 119870-means capitalizing on the PSOrsquos global search abilityExperimental results with a KDD dataset demonstrate thatthe proposedmethod is effective in dealingwith large datasetsand achieves a satisfactory detection rate Yongzhong et al[55] also proposed a similar PSO-119870-means hybrid systemMuda et al [32] proposed a hybrid learning approach througha combination of119870-means clustering and Naıve Bayes classi-ficationThe authors clustered all data into the correspondinggroup before applying a classifier for classification purposeTheir results show that the proposed approach achievedreasonable false alarm rate A clustering algorithm that usesSOM and 119870-means for intrusion detection was proposedby Wang et al [33] When the SOM finishes its trainingprocess119870-means is adopted to refine theweights obtained bytraining In addition once SOM completes cluster formation119870-means is applied to refine the final clustering results Chan-drasekhar and Raghuveer [34] proposed a new approachbased on fuzzy neural network and support vector machineto improve the IDS detection rate Here 119870-means clusteringwas first applied to generate different training data subsets

However Muniyandi et al [30] proposed an anomalydetection method using 119870-means combined with C45 forclassifying anomalous andnormal activities In this approach119870-means clustering is used initially to partition the trainingdataset into 119896 clusters using Euclidean distance Then thedecision tree is built for each cluster using the C45 techniqueand the rules created by the decision tree are used todetect intrusion events The testing phase is implementedthrough two steps In the first step the Euclidean distanceis computed for every testing instance before finding theclosest cluster Therefore the decision tree corresponding tothe closest cluster is selected to detect the class of the instanceIn this work 119870-means still have some shortcomings asthe clustering output mostly depends on the selection ofthe initial centroids of clusters In addition the numberof clusters 119896 needs to be given in advance Moreover theresulting clusters do not include all the possibilities of classinstances

3 Proposed Hybrid Modified 119870-Means withC45 in MAS-IDS

TheproposedMAS-IDS uses three agents that are responsiblefor achieving the IDS goals coordinator analysis and com-munication agent TheMAS-IDS system is shows in Figure 1The details of the proposed system are elaborated on in thenext sections

31 Multiagent System-Based IntrusionDetection System (MAS-IDS)

311 Coordinator Agent The deliberative coordinator agentuses a training dataset to train the hybrid system through

Host 1

Communication agent

Coordinatoragent

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Host 2

Communication agent

Coordinatoragent

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Ana

lysis

agen

t

Host N

Communication agent

Coordinatoragent

Network traffic

Communication between agentsShare information between agentsCapture network traffic

middot middot middot

Figure 1 The MAS-IDS architecture

constructed clusters by using modified 119870-means It subse-quently applies the C45 technique on each cluster to buildthe decision trees that will be used in the testing phase Onthe other hand the coordinator agent receives and dividesthe gathering traffic data network into a number of subsetsby applying (2) Therefore it sends these subsets with thetrees and centroids of clusters to the analysis agents in theother hosts by using communication agents At the same timethe coordinator agent has information about all the hosts ofthe system environment where each host periodically sendsthe number of its cores that are presently not busy to thecoordinator agent The scenario in which the coordinatoragent operates can be summarized in the following steps

(1) Read the training dataset(2) Call modified119870-means (Pseudocode 2) to cluster the

training dataset into a set of clusters(3) Build the tree for each cluster by using the C45

technique(4) Send the trees and centroids of clusters to static agents

in the other hosts(5) Capture traffic network data packets(6) Specify the number of core CPUs in hosts of the

system that are presently not busy (assume 119899)(7) Divide the captured data into 119899 subsets refer to (2)(8) Create 119899 analysis agents in the other hosts refer to (3)(9) Send the 119899 data subsets to the analysis agents by using

the communication agent(10) Wait until all analysis agents finish analyzing data

The Scientific World Journal 5

Input DatasetOutput Clusters Centroids(1) Set 119896 = 1 119888

1= First instance 120596

1

(2) For every instance 120596119894isin Dataset and 119894 = 1 Do

(21) If 120596119894minus 119888119904 gt 119905ℎ119903119890119904ℎ119900119897119889 119904 = 1 119896Then

(22) 119896 = 119896 + 1 119888119896= 120596119894

(3) Assign every instance 120596119894isin Dataset to the closest centroid to make 119896 clusters 119862

1 1198622 119862

119896

(4) Calculate cluster centroids 120596119894= (1119896

119894) sum119896119894

119895=1120596119894119895 119894 = 1 119896

(5) For every instance 120596119894isin Dataset Do

(51) Reassign 120596119894to closest cluster centroid 120596

119894isin 119862119904is moved from 119862

119904to 119862119905

If 120596119894minus 120596119905 le 120596

119894minus 120596119895 forall119895 = 1 119896 119895 = 119904

(52) Recalculate centroids for clusters 119862119904and 119862

119905

(6) If cluster instances are stabilized then stop Else go to Step (4)

Pseudocode 2 Pseudocode of the modified 119870-means algorithm

(11) Combine the results yielded by the analysis agentsusing (4)

Data set = 1198781 1198782 119878119895 119878119899 (2)

subject to

119878119895= set of Instances isin Data set each Instance notin

119878119894

forall119894 = 1 119899 and 119894 = 119895|119878119895| = |Data set|119899 119895 = 1 119899

where 119899 le core CPUs available in the system

312 Analysis Agent A set of reactive analysis agents iscreated in the other hosts within the system environment byusing (3) where the number of analysis agents is equal to thenumber of subsets resulting from the splitting process Eachanalysis agent receives one subset of testing data along withthe centroids of clusters and decision trees that have beencreated in the training phase by the coordinator agent Infact the coordinator agent sends amessage about the numberof agents needed to the deliberative agent resident in eachhost Thus the resident agent creates these analysis agentsUnfortunately in JADE if the coordinator agent is creatingthe analysis agents directly in the other hosts the analysisagents are logically created in these hosts However physi-cally these agents are created in the host of the coordinatoragent Consequently the analysis agents will be using samethe core CPU and memory as the coordinator agent hostEach analysis agent is running as a thread by using one of thecore CPUs on that host [56] Hence if this host has four coresit can simultaneously run four threads of agents in parallel

forall119878119895 create AA

119895isin Host

119894997888rarr analysis (119878

119895AA119895)

119895 = 1 119899 1 le 119894 le 119898

(3)

subject to

number of AA in Host119894le number of core CPUs in

Host119894

where AA119895represents the analysis agent 119895 analysis (119878

119895AA119895)

represents the analysis function used to analyze subset 119878119895by

analysis Agent AA119895 and 119898 represents the number of hosts

available in the system environmentIn the analysis agent each instance is first tested with

the closest centroid of clusters after which the decision treecorresponding to this centroid is used to determine the typeof instance If the instance attributes do not match any classfrom the decision tree this instance is treated as attack andthe decision trees are updated with the data pertaining to thisattack to assist with future detection The scenario in whichthe analysis agent operates is as follows

(1) Receive the subset data centroids of clusters and treesfrom the communication agent

(2) Call the pseudocode (Pseudocodes 3 and 4) to analyzethe data

(3) Return the results to the coordinator agent by usingthe communication agent

Finally the coordinator agent combines all the resultsproduced by the analysis agents by using (4) to provide thefinal results to the system administrator At this time thesystem administrator will raise an alert to deal with thissituation

Normal instances =119899

119894=1

Normal (AA119894)

Attack instances =119899

119894=1

Attack (AA119894)

(4)

313 Communication Agent The communication agent isresponsible for transferring data and results between agentsThe scenario in which the communication agent operates ispresented through the following steps

(1) Receive datasets centroids of clusters and decisiontrees from the coordinator agent

(2) Move the above from the coordinator agent host tothe analysis agent host

6 The Scientific World Journal

Input Testing Dataset centroid treesOutput Predication of instances testing dataset P(1) For every instance 120596

119894isin testing dataset Do

(11) Choose centroid 119888119895 120596119894minus 119888119895 lt 120596

119894minus 119888119904 for all 119878 = 1 119896 and 119878 = 119895

(12) 119875 = add Call pseudo code MatchTree (120596119894 119903119900119900119905119895)

Pseudocode 3 Pseudocode of the determination the closest centroid for an instance

Input instance 120596 rootOutput Class(1) If rootbranch = 0Then return rootvalue(2) Choose attribute 119886

119894isin 120596 corresponding index of root

(3) For all branchValues isin root Do(31) If 119887119903119886119899119888ℎ119881119886119897119906119890

119895= 119886119894exists Then Call pseudo code MatchTree (120596 119903119900119900119905119887119903119886119899119888ℎ

119895)

(32) return ldquounknownrdquo

Pseudocode 4 Pseudocode of MatchTree

(3) Give the dataset centroids of clusters and decisiontrees to the analysis agent

(4) Receive the results from the analysis agent

(5) Move the results from the analysis agent host to thecoordinator agent host

(6) Provide the results to the coordinator agent

32 Modified 119870-Means Algorithm The main advantage ofmodified 119870-means that distinguishes it from other adjusted119870-means in the extant literature is its ability to consider allpossible eventualities by treating all the divergent points inthe dataset as initial centroids of clusters rather than selectinga specific set of initial centroids randomly as is typicallydone In other words modified 119870-means constructs clusterswith all the cases characterized by significant differencesamong instancesThus modified119870-means will distribute thedataset instances to convenient clusters with best accuracyHowever unlike other adjusted119870-means in the modified119870-means approach determining the number of clusters 119896 is notrequired as this is done dynamically The main differencebetween the modified and the standard 119870-means is in theselection of initial centroids of clusters as shown in thefollowing steps

(1) Select the first centroid of the cluster as the firstinstance of the dataset

(2) Select the instance with the distance from all the pre-viously selected centroids greater than the specifiedthreshold (best threshold = 4000 first experiment) asthe next centroid

(3) Repeat Step (2) to reach to the end of the dataset

(4) Apply the other steps of standard 119870-means on theselected initial centroid of clusters

Pseudocode 2 shows the pseudocode of modified 119870-means Our modification of 119870-means is evident in Steps (1)and (2) of the pseudocode

33 C45 Algorithm After distributing the training datasetinstances among the clusters by usingmodified119870-means thestandard C45 technique developed by Quinlan [57] is usedto build the trees from clusters whereby C45 builds tree foreach cluster More details and the pseudocode of C45 canbeen found elsewhere [58 59]

34 Testing Phase This phase is implemented by the analysisagent and is executed in two stages to test the traffic datanetwork In the first stage the closest centroid of testinginstance is chosen (the pseudocode of this stage is shown inPseudocodes 3) In the second stage the subtree correspond-ing to the centroid chosen in the first stage is implemented inorder to test the instance and identify the appropriate class forthis instanceThe pseudocode of the second stage is shown inPseudocodes 4

4 Experimental Setup and Analysis

We used the benchmark KDD Cup 1999 [60] to evaluatethe MAS-IDS performance In most of the previous worksin this field the authors used cross-validation such as 10-fold for evaluation Cross-validation was based on using thesame classes of training data without adding new classesin the testing stage Thus these works could achieve highperformance in terms of accuracy and detection rate Onthe other hand the strength of IDS stems from its ability todetect unknown attacks (new attacks) The KDD Cup 1999dataset consists of two datasets 10 KDDCUP dataset (usedfor training) and Corrected dataset (employed in testing)More details about KDD Cup 1999 can be found in extantliterature [61] Among the available performance measures

The Scientific World Journal 7

accuracy (Acc) detection rate (DR) and false alarm rate(FAR) are most popular when aiming to evaluate the MAS-IDS performance

Acc = TP + TNTP + TN + FP + FN

DR =TP

TP + FN

FAR =FP

TN + FP

(5)

The computers used to implement the experiments areequipped with Core-i7 340GHz with 8 core CPUs and6GB RAM The OS is Windows 7 professional 64 bitsThe experiment was conducted in JADE platform and wasimplemented using JAVA programming

Table 1 shows the details of datasets used to evaluate theMAS-IDS performance along with the conventional method(hybrid standard 119870-means with C45) and other techniquesIt should be noted that training datasets (trainDS1 trainDS2trainDS3 and trainDS4) were generated randomly from 10KDDCUP dataset while testing datasets (testDS1 testDS2testDS3 and testDS4) were generated randomly from Cor-rected dataset

The preprocessing for the symbolic attributes is achievedThe three symbolic attributes are protocol service and flagthat convert to numeric values such as protocol attributeThe three values tcp udp and icmp are converted to 1 2and 3 respectively and the same approach is adopted for theremaining attributes

In this study three experiments were carried out Inthe first experiment the best value of the threshold wascomputed while the MAS-IDS performance was evaluatedin the second experiment by comparing the results yieldedby MAS-IDS with those obtained through the conventionalmethod and other techniques available in Weka and MatlabIn the third experiment we compared the processing timerequired by MAS-IDS with that of hybrid modified119870-meanswith C45 in nonagent environment

41 Identifying the Best Threshold for Modified119870-Means Themodified119870-means requires a predetermined threshold valueto select the initial centroids of clusters In this experiment alltraining datasets in Table 1 are used with testDS1 to computethe average accuracy for different values (1000ndash10000) Thevalue that yields the highest accuracy is thus chosen as thethreshold for modified 119870-means As can be seen in Figure 2the threshold value is 4000 as it results in an average ofaccuracy of 090155 We used all the training datasets withonly one testing dataset to choose the threshold value becausethe modified 119870-means approach is applied only on thetraining dataset to construct the clusters In all subsequentexperiments the chosen threshold (4000) is employed withhybrid modified119870-means and C45

42 MAS-IDS Performance In order to compare MAS-IDSwith the hybrid standard 119870-means and C45 [30] the bestvalue of 119896 for 119870-means is identified Typically 119870-means is

Table 1 The details of evaluation datasets

Dataset Normal DoS Probe R2L U2R TotaltrainDS1 900 1000 300 500 300 3000trainDS2 1100 1300 300 800 500 4000trainDS3 1500 1800 400 1000 300 5000trainDS4 1800 1800 500 1100 800 6000testDS1 5000 3000 700 900 400 10000testDS2 10000 7000 1000 1500 500 20000testDS3 15000 10000 1500 2500 1000 30000testDS4 20000 14000 1500 3000 1500 40000

088408860888

0890892089408960898

0909020904

0 2000 4000 6000 8000 10000

Accu

racy

Threshold

Figure 2 Computing the best threshold value for modified 119870-means

0010203040506070809

1

0 10 20 30 40 50 60 70 80 90 100 110

Perfo

rman

ce

k

AccuracyDRFAR

Figure 3 The performance of hybrid standard 119870-means and C45

run independently for different values of 119896 and the partitionthat appears the most meaningful to the domain expertis selected [62] Figure 3 shows the performance of hybridstandard 119870-means with C45 for different 119896 values (119896 =

10 20 30 100) The best 119896 value is equal to 10 because ityields the highest accuracy (9067) and detection rate (8480)As can be seen only the false alarm rate percentage (346) isnot the most optimal as 21 is achieved when 119896 = 100 Thusin all subsequent experiments we adopt 119896 = 10 as the bestnumber of clusters

8 The Scientific World Journal

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 4 ROC curve for testDS1

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 5 ROC curve for testDS2

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 6 ROC curve for testDS3

The ROC curves in Figures 4 5 6 and 7 show theproposed method performance in comparison with hybrid119870-means with C45 [30]

According to the ROC curves for the proposed hybridmodified119870-means with C45 in MAS-IDS it achieved better

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 7 ROC curve for testDS4

Table 2 The accuracy results of the MAS-IDS versus hybrid 119870-means and C45 [30]

Training dataset Testing dataset MAS-IDS Hybrid 119870-meansand C45

trainDS1

testDS1 09031 08993testDS2 091745 0916021testDS3 09126 0909633testDS4 09147 0911625

trainDS2

testDS1 08874 08742testDS2 09107 09008testDS3 0904367 0894967testDS4 090545 089475

trainDS3

testDS1 09058 08932testDS2 092225 09135testDS3 091667 09049testDS4 0916125 090805

trainDS4

testDS1 09099 08963testDS2 09215 090815testDS3 09155 09016testDS4 0918025 090625

119901 value 000000028

results in comparison with the conventional method The 119905-test shows that theMAS-IDS significantly improved accuracywith 119901 value lt 005 (000000028) Therefore the MAS-IDSwas tested by computing the classification results pertainingto each training dataset using all testing datasets presentedin Table 1 Table 2 shows the comparison accuracy betweenMAS-IDS and hybrid 119870-means with C45 Table 3 showsthe average results of the MAS-IDS evaluation along withthe comparison with the conventional method and othermethods fromWeka and Matlab

As can be seen from Table 3 the MAS-IDS approachachieves higher accuracy and detection rate as well as119865-measure However the false alarm rate precision andspecificity are not superior to those achieved by the othermethods especially the Decision Table which produces

The Scientific World Journal 9

Table 3 Comparison of the MAS-IDS performance with other methods using different measures

Method Accuracy DR FAR Precision Specificity 119865-measureMAS-IDS 09113 08526 00299 09665 09701 09056Hybrid 119870-means and C45 (2012) 09021 08394 00353 09600 09647 08954Bayes Net 09017 08177 00142 09829 09858 08926Naıve Bayes 08150 07727 01427 08578 08573 08076SMO 08805 07785 00174 09781 09826 08664IBk 08886 07962 00190 09766 09810 08771J48 08513 07298 00273 09638 09727 08299NBTree 09007 08096 00081 09900 09919 08903Decision Table 08306 06631 00020 09970 09980 07956JRip 08377 06983 00229 09682 09771 08111LibSVM 07964 08120 02191 08169 07809 08068

0010203040506070809

1

Perfo

rman

ce

AccuracyDRFAR

MA

S-ID

S

Hyb

rid K

-mea

ns

and

C45

Baye

s Net

Naiuml

ve B

ayes

SMO

IBk

J48

NBT

ree

Dec

ision

Tab

le

JRip

LibS

VM

Figure 8 Comparison performance of MAS-IDS with other meth-ods

the best ratios In state of the art methods IDS accuracy isusually measured due to the equivalence between the errorand correct rates Thus when comparing various methodswe adopt accuracy as the best measure On this basis theperformance of our MAS-IDS is superior to other methodsas shown in Table 3 More specifically the average MAS-IDS accuracy computed by using all testing datasets andall training datasets is 09113 which is greater than thoseachieved by other methods Figure 8 shows the performanceof all methods using data given in Table 3

43 MS-IDS Processing Time The last experiment demon-strates the strength of the MAS-IDS in improving the dataclassification processing time by using a multiagent systemIn this experiment five of the previously specified computerswere used In addition we used the forth training dataset(trainDS4) fromTable 1with four new large testing datasets toevaluate the strength ofMAS-IDS in processing large datasetsin less time Table 4 shows the characteristics of the newtesting datasets

To show the ability of the MAS-IDS to reduce theprocessing time the MAS-IDS approach is compared with

Table 4 Characteristics of new testing datasets used to evaluate theMAS-IDS processing time

Dataset Normal DoS Probe R2L U2R TotalnewTestDS1 35000 35000 10000 10000 10000 100000newTestDS2 70000 70000 20000 25000 15000 200000newTestDS3 100000 100000 30000 50000 20000 300000newTestDS4 150000 150000 30000 50000 20000 400000

nonagents hybrid modified 119870-means and C45 Here MAS-IDS is implemented every time a new computer is added Inother words MAS-IDS initially runs on one computer andwhen a second computer is added it starts running on bothand so on until all five computers are used Table 5 shows theprocessing time of this experimentThemaximumnumber ofagents that can be implemented with each computer is eightbecause each computer has eight core CPUs and each corecan run in parallel only one agent at the time The trainingtime of this experiment is 8814 s It should be noted thatthe coordinator agent is running on the first computer of thesystem environment

The results presented in Table 5 are based on the numberof computers and the number of agents The results in theupper left corner of Table 5 pertain to the case of using onecomputer with one agent Thus this is the worst case andshould be compared with nonagents hybrid modified 119870-means with C45 On the other hand the results in the lowerright corner of Table 4 represent the best case of MAS-IDS(maximum number of computers and agents) Furthermoreas can be seen from the data when using two computers dueto the cost of data transfer through the network which willincrease the processing time no improvements are achievedby MAS-IDS relative to other approaches However thisproblem is mitigated by introduction of additional comput-ers Nonetheless theMAS-IDS processing timewhen appliedto a large dataset such as newTestDS4 is inadequate becausethe dataset subsets are still large and require long time tobe transferred to other computers through the network Thisproblem is eliminated when a large number of computersare employed due to dividing the dataset into smaller datasubsets Finally theMAS-IDS processing time decreases with

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

The Scientific World Journal 5

Input DatasetOutput Clusters Centroids(1) Set 119896 = 1 119888

1= First instance 120596

1

(2) For every instance 120596119894isin Dataset and 119894 = 1 Do

(21) If 120596119894minus 119888119904 gt 119905ℎ119903119890119904ℎ119900119897119889 119904 = 1 119896Then

(22) 119896 = 119896 + 1 119888119896= 120596119894

(3) Assign every instance 120596119894isin Dataset to the closest centroid to make 119896 clusters 119862

1 1198622 119862

119896

(4) Calculate cluster centroids 120596119894= (1119896

119894) sum119896119894

119895=1120596119894119895 119894 = 1 119896

(5) For every instance 120596119894isin Dataset Do

(51) Reassign 120596119894to closest cluster centroid 120596

119894isin 119862119904is moved from 119862

119904to 119862119905

If 120596119894minus 120596119905 le 120596

119894minus 120596119895 forall119895 = 1 119896 119895 = 119904

(52) Recalculate centroids for clusters 119862119904and 119862

119905

(6) If cluster instances are stabilized then stop Else go to Step (4)

Pseudocode 2 Pseudocode of the modified 119870-means algorithm

(11) Combine the results yielded by the analysis agentsusing (4)

Data set = 1198781 1198782 119878119895 119878119899 (2)

subject to

119878119895= set of Instances isin Data set each Instance notin

119878119894

forall119894 = 1 119899 and 119894 = 119895|119878119895| = |Data set|119899 119895 = 1 119899

where 119899 le core CPUs available in the system

312 Analysis Agent A set of reactive analysis agents iscreated in the other hosts within the system environment byusing (3) where the number of analysis agents is equal to thenumber of subsets resulting from the splitting process Eachanalysis agent receives one subset of testing data along withthe centroids of clusters and decision trees that have beencreated in the training phase by the coordinator agent Infact the coordinator agent sends amessage about the numberof agents needed to the deliberative agent resident in eachhost Thus the resident agent creates these analysis agentsUnfortunately in JADE if the coordinator agent is creatingthe analysis agents directly in the other hosts the analysisagents are logically created in these hosts However physi-cally these agents are created in the host of the coordinatoragent Consequently the analysis agents will be using samethe core CPU and memory as the coordinator agent hostEach analysis agent is running as a thread by using one of thecore CPUs on that host [56] Hence if this host has four coresit can simultaneously run four threads of agents in parallel

forall119878119895 create AA

119895isin Host

119894997888rarr analysis (119878

119895AA119895)

119895 = 1 119899 1 le 119894 le 119898

(3)

subject to

number of AA in Host119894le number of core CPUs in

Host119894

where AA119895represents the analysis agent 119895 analysis (119878

119895AA119895)

represents the analysis function used to analyze subset 119878119895by

analysis Agent AA119895 and 119898 represents the number of hosts

available in the system environmentIn the analysis agent each instance is first tested with

the closest centroid of clusters after which the decision treecorresponding to this centroid is used to determine the typeof instance If the instance attributes do not match any classfrom the decision tree this instance is treated as attack andthe decision trees are updated with the data pertaining to thisattack to assist with future detection The scenario in whichthe analysis agent operates is as follows

(1) Receive the subset data centroids of clusters and treesfrom the communication agent

(2) Call the pseudocode (Pseudocodes 3 and 4) to analyzethe data

(3) Return the results to the coordinator agent by usingthe communication agent

Finally the coordinator agent combines all the resultsproduced by the analysis agents by using (4) to provide thefinal results to the system administrator At this time thesystem administrator will raise an alert to deal with thissituation

Normal instances =119899

119894=1

Normal (AA119894)

Attack instances =119899

119894=1

Attack (AA119894)

(4)

313 Communication Agent The communication agent isresponsible for transferring data and results between agentsThe scenario in which the communication agent operates ispresented through the following steps

(1) Receive datasets centroids of clusters and decisiontrees from the coordinator agent

(2) Move the above from the coordinator agent host tothe analysis agent host

6 The Scientific World Journal

Input Testing Dataset centroid treesOutput Predication of instances testing dataset P(1) For every instance 120596

119894isin testing dataset Do

(11) Choose centroid 119888119895 120596119894minus 119888119895 lt 120596

119894minus 119888119904 for all 119878 = 1 119896 and 119878 = 119895

(12) 119875 = add Call pseudo code MatchTree (120596119894 119903119900119900119905119895)

Pseudocode 3 Pseudocode of the determination the closest centroid for an instance

Input instance 120596 rootOutput Class(1) If rootbranch = 0Then return rootvalue(2) Choose attribute 119886

119894isin 120596 corresponding index of root

(3) For all branchValues isin root Do(31) If 119887119903119886119899119888ℎ119881119886119897119906119890

119895= 119886119894exists Then Call pseudo code MatchTree (120596 119903119900119900119905119887119903119886119899119888ℎ

119895)

(32) return ldquounknownrdquo

Pseudocode 4 Pseudocode of MatchTree

(3) Give the dataset centroids of clusters and decisiontrees to the analysis agent

(4) Receive the results from the analysis agent

(5) Move the results from the analysis agent host to thecoordinator agent host

(6) Provide the results to the coordinator agent

32 Modified 119870-Means Algorithm The main advantage ofmodified 119870-means that distinguishes it from other adjusted119870-means in the extant literature is its ability to consider allpossible eventualities by treating all the divergent points inthe dataset as initial centroids of clusters rather than selectinga specific set of initial centroids randomly as is typicallydone In other words modified 119870-means constructs clusterswith all the cases characterized by significant differencesamong instancesThus modified119870-means will distribute thedataset instances to convenient clusters with best accuracyHowever unlike other adjusted119870-means in the modified119870-means approach determining the number of clusters 119896 is notrequired as this is done dynamically The main differencebetween the modified and the standard 119870-means is in theselection of initial centroids of clusters as shown in thefollowing steps

(1) Select the first centroid of the cluster as the firstinstance of the dataset

(2) Select the instance with the distance from all the pre-viously selected centroids greater than the specifiedthreshold (best threshold = 4000 first experiment) asthe next centroid

(3) Repeat Step (2) to reach to the end of the dataset

(4) Apply the other steps of standard 119870-means on theselected initial centroid of clusters

Pseudocode 2 shows the pseudocode of modified 119870-means Our modification of 119870-means is evident in Steps (1)and (2) of the pseudocode

33 C45 Algorithm After distributing the training datasetinstances among the clusters by usingmodified119870-means thestandard C45 technique developed by Quinlan [57] is usedto build the trees from clusters whereby C45 builds tree foreach cluster More details and the pseudocode of C45 canbeen found elsewhere [58 59]

34 Testing Phase This phase is implemented by the analysisagent and is executed in two stages to test the traffic datanetwork In the first stage the closest centroid of testinginstance is chosen (the pseudocode of this stage is shown inPseudocodes 3) In the second stage the subtree correspond-ing to the centroid chosen in the first stage is implemented inorder to test the instance and identify the appropriate class forthis instanceThe pseudocode of the second stage is shown inPseudocodes 4

4 Experimental Setup and Analysis

We used the benchmark KDD Cup 1999 [60] to evaluatethe MAS-IDS performance In most of the previous worksin this field the authors used cross-validation such as 10-fold for evaluation Cross-validation was based on using thesame classes of training data without adding new classesin the testing stage Thus these works could achieve highperformance in terms of accuracy and detection rate Onthe other hand the strength of IDS stems from its ability todetect unknown attacks (new attacks) The KDD Cup 1999dataset consists of two datasets 10 KDDCUP dataset (usedfor training) and Corrected dataset (employed in testing)More details about KDD Cup 1999 can be found in extantliterature [61] Among the available performance measures

The Scientific World Journal 7

accuracy (Acc) detection rate (DR) and false alarm rate(FAR) are most popular when aiming to evaluate the MAS-IDS performance

Acc = TP + TNTP + TN + FP + FN

DR =TP

TP + FN

FAR =FP

TN + FP

(5)

The computers used to implement the experiments areequipped with Core-i7 340GHz with 8 core CPUs and6GB RAM The OS is Windows 7 professional 64 bitsThe experiment was conducted in JADE platform and wasimplemented using JAVA programming

Table 1 shows the details of datasets used to evaluate theMAS-IDS performance along with the conventional method(hybrid standard 119870-means with C45) and other techniquesIt should be noted that training datasets (trainDS1 trainDS2trainDS3 and trainDS4) were generated randomly from 10KDDCUP dataset while testing datasets (testDS1 testDS2testDS3 and testDS4) were generated randomly from Cor-rected dataset

The preprocessing for the symbolic attributes is achievedThe three symbolic attributes are protocol service and flagthat convert to numeric values such as protocol attributeThe three values tcp udp and icmp are converted to 1 2and 3 respectively and the same approach is adopted for theremaining attributes

In this study three experiments were carried out Inthe first experiment the best value of the threshold wascomputed while the MAS-IDS performance was evaluatedin the second experiment by comparing the results yieldedby MAS-IDS with those obtained through the conventionalmethod and other techniques available in Weka and MatlabIn the third experiment we compared the processing timerequired by MAS-IDS with that of hybrid modified119870-meanswith C45 in nonagent environment

41 Identifying the Best Threshold for Modified119870-Means Themodified119870-means requires a predetermined threshold valueto select the initial centroids of clusters In this experiment alltraining datasets in Table 1 are used with testDS1 to computethe average accuracy for different values (1000ndash10000) Thevalue that yields the highest accuracy is thus chosen as thethreshold for modified 119870-means As can be seen in Figure 2the threshold value is 4000 as it results in an average ofaccuracy of 090155 We used all the training datasets withonly one testing dataset to choose the threshold value becausethe modified 119870-means approach is applied only on thetraining dataset to construct the clusters In all subsequentexperiments the chosen threshold (4000) is employed withhybrid modified119870-means and C45

42 MAS-IDS Performance In order to compare MAS-IDSwith the hybrid standard 119870-means and C45 [30] the bestvalue of 119896 for 119870-means is identified Typically 119870-means is

Table 1 The details of evaluation datasets

Dataset Normal DoS Probe R2L U2R TotaltrainDS1 900 1000 300 500 300 3000trainDS2 1100 1300 300 800 500 4000trainDS3 1500 1800 400 1000 300 5000trainDS4 1800 1800 500 1100 800 6000testDS1 5000 3000 700 900 400 10000testDS2 10000 7000 1000 1500 500 20000testDS3 15000 10000 1500 2500 1000 30000testDS4 20000 14000 1500 3000 1500 40000

088408860888

0890892089408960898

0909020904

0 2000 4000 6000 8000 10000

Accu

racy

Threshold

Figure 2 Computing the best threshold value for modified 119870-means

0010203040506070809

1

0 10 20 30 40 50 60 70 80 90 100 110

Perfo

rman

ce

k

AccuracyDRFAR

Figure 3 The performance of hybrid standard 119870-means and C45

run independently for different values of 119896 and the partitionthat appears the most meaningful to the domain expertis selected [62] Figure 3 shows the performance of hybridstandard 119870-means with C45 for different 119896 values (119896 =

10 20 30 100) The best 119896 value is equal to 10 because ityields the highest accuracy (9067) and detection rate (8480)As can be seen only the false alarm rate percentage (346) isnot the most optimal as 21 is achieved when 119896 = 100 Thusin all subsequent experiments we adopt 119896 = 10 as the bestnumber of clusters

8 The Scientific World Journal

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 4 ROC curve for testDS1

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 5 ROC curve for testDS2

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 6 ROC curve for testDS3

The ROC curves in Figures 4 5 6 and 7 show theproposed method performance in comparison with hybrid119870-means with C45 [30]

According to the ROC curves for the proposed hybridmodified119870-means with C45 in MAS-IDS it achieved better

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 7 ROC curve for testDS4

Table 2 The accuracy results of the MAS-IDS versus hybrid 119870-means and C45 [30]

Training dataset Testing dataset MAS-IDS Hybrid 119870-meansand C45

trainDS1

testDS1 09031 08993testDS2 091745 0916021testDS3 09126 0909633testDS4 09147 0911625

trainDS2

testDS1 08874 08742testDS2 09107 09008testDS3 0904367 0894967testDS4 090545 089475

trainDS3

testDS1 09058 08932testDS2 092225 09135testDS3 091667 09049testDS4 0916125 090805

trainDS4

testDS1 09099 08963testDS2 09215 090815testDS3 09155 09016testDS4 0918025 090625

119901 value 000000028

results in comparison with the conventional method The 119905-test shows that theMAS-IDS significantly improved accuracywith 119901 value lt 005 (000000028) Therefore the MAS-IDSwas tested by computing the classification results pertainingto each training dataset using all testing datasets presentedin Table 1 Table 2 shows the comparison accuracy betweenMAS-IDS and hybrid 119870-means with C45 Table 3 showsthe average results of the MAS-IDS evaluation along withthe comparison with the conventional method and othermethods fromWeka and Matlab

As can be seen from Table 3 the MAS-IDS approachachieves higher accuracy and detection rate as well as119865-measure However the false alarm rate precision andspecificity are not superior to those achieved by the othermethods especially the Decision Table which produces

The Scientific World Journal 9

Table 3 Comparison of the MAS-IDS performance with other methods using different measures

Method Accuracy DR FAR Precision Specificity 119865-measureMAS-IDS 09113 08526 00299 09665 09701 09056Hybrid 119870-means and C45 (2012) 09021 08394 00353 09600 09647 08954Bayes Net 09017 08177 00142 09829 09858 08926Naıve Bayes 08150 07727 01427 08578 08573 08076SMO 08805 07785 00174 09781 09826 08664IBk 08886 07962 00190 09766 09810 08771J48 08513 07298 00273 09638 09727 08299NBTree 09007 08096 00081 09900 09919 08903Decision Table 08306 06631 00020 09970 09980 07956JRip 08377 06983 00229 09682 09771 08111LibSVM 07964 08120 02191 08169 07809 08068

0010203040506070809

1

Perfo

rman

ce

AccuracyDRFAR

MA

S-ID

S

Hyb

rid K

-mea

ns

and

C45

Baye

s Net

Naiuml

ve B

ayes

SMO

IBk

J48

NBT

ree

Dec

ision

Tab

le

JRip

LibS

VM

Figure 8 Comparison performance of MAS-IDS with other meth-ods

the best ratios In state of the art methods IDS accuracy isusually measured due to the equivalence between the errorand correct rates Thus when comparing various methodswe adopt accuracy as the best measure On this basis theperformance of our MAS-IDS is superior to other methodsas shown in Table 3 More specifically the average MAS-IDS accuracy computed by using all testing datasets andall training datasets is 09113 which is greater than thoseachieved by other methods Figure 8 shows the performanceof all methods using data given in Table 3

43 MS-IDS Processing Time The last experiment demon-strates the strength of the MAS-IDS in improving the dataclassification processing time by using a multiagent systemIn this experiment five of the previously specified computerswere used In addition we used the forth training dataset(trainDS4) fromTable 1with four new large testing datasets toevaluate the strength ofMAS-IDS in processing large datasetsin less time Table 4 shows the characteristics of the newtesting datasets

To show the ability of the MAS-IDS to reduce theprocessing time the MAS-IDS approach is compared with

Table 4 Characteristics of new testing datasets used to evaluate theMAS-IDS processing time

Dataset Normal DoS Probe R2L U2R TotalnewTestDS1 35000 35000 10000 10000 10000 100000newTestDS2 70000 70000 20000 25000 15000 200000newTestDS3 100000 100000 30000 50000 20000 300000newTestDS4 150000 150000 30000 50000 20000 400000

nonagents hybrid modified 119870-means and C45 Here MAS-IDS is implemented every time a new computer is added Inother words MAS-IDS initially runs on one computer andwhen a second computer is added it starts running on bothand so on until all five computers are used Table 5 shows theprocessing time of this experimentThemaximumnumber ofagents that can be implemented with each computer is eightbecause each computer has eight core CPUs and each corecan run in parallel only one agent at the time The trainingtime of this experiment is 8814 s It should be noted thatthe coordinator agent is running on the first computer of thesystem environment

The results presented in Table 5 are based on the numberof computers and the number of agents The results in theupper left corner of Table 5 pertain to the case of using onecomputer with one agent Thus this is the worst case andshould be compared with nonagents hybrid modified 119870-means with C45 On the other hand the results in the lowerright corner of Table 4 represent the best case of MAS-IDS(maximum number of computers and agents) Furthermoreas can be seen from the data when using two computers dueto the cost of data transfer through the network which willincrease the processing time no improvements are achievedby MAS-IDS relative to other approaches However thisproblem is mitigated by introduction of additional comput-ers Nonetheless theMAS-IDS processing timewhen appliedto a large dataset such as newTestDS4 is inadequate becausethe dataset subsets are still large and require long time tobe transferred to other computers through the network Thisproblem is eliminated when a large number of computersare employed due to dividing the dataset into smaller datasubsets Finally theMAS-IDS processing time decreases with

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

6 The Scientific World Journal

Input Testing Dataset centroid treesOutput Predication of instances testing dataset P(1) For every instance 120596

119894isin testing dataset Do

(11) Choose centroid 119888119895 120596119894minus 119888119895 lt 120596

119894minus 119888119904 for all 119878 = 1 119896 and 119878 = 119895

(12) 119875 = add Call pseudo code MatchTree (120596119894 119903119900119900119905119895)

Pseudocode 3 Pseudocode of the determination the closest centroid for an instance

Input instance 120596 rootOutput Class(1) If rootbranch = 0Then return rootvalue(2) Choose attribute 119886

119894isin 120596 corresponding index of root

(3) For all branchValues isin root Do(31) If 119887119903119886119899119888ℎ119881119886119897119906119890

119895= 119886119894exists Then Call pseudo code MatchTree (120596 119903119900119900119905119887119903119886119899119888ℎ

119895)

(32) return ldquounknownrdquo

Pseudocode 4 Pseudocode of MatchTree

(3) Give the dataset centroids of clusters and decisiontrees to the analysis agent

(4) Receive the results from the analysis agent

(5) Move the results from the analysis agent host to thecoordinator agent host

(6) Provide the results to the coordinator agent

32 Modified 119870-Means Algorithm The main advantage ofmodified 119870-means that distinguishes it from other adjusted119870-means in the extant literature is its ability to consider allpossible eventualities by treating all the divergent points inthe dataset as initial centroids of clusters rather than selectinga specific set of initial centroids randomly as is typicallydone In other words modified 119870-means constructs clusterswith all the cases characterized by significant differencesamong instancesThus modified119870-means will distribute thedataset instances to convenient clusters with best accuracyHowever unlike other adjusted119870-means in the modified119870-means approach determining the number of clusters 119896 is notrequired as this is done dynamically The main differencebetween the modified and the standard 119870-means is in theselection of initial centroids of clusters as shown in thefollowing steps

(1) Select the first centroid of the cluster as the firstinstance of the dataset

(2) Select the instance with the distance from all the pre-viously selected centroids greater than the specifiedthreshold (best threshold = 4000 first experiment) asthe next centroid

(3) Repeat Step (2) to reach to the end of the dataset

(4) Apply the other steps of standard 119870-means on theselected initial centroid of clusters

Pseudocode 2 shows the pseudocode of modified 119870-means Our modification of 119870-means is evident in Steps (1)and (2) of the pseudocode

33 C45 Algorithm After distributing the training datasetinstances among the clusters by usingmodified119870-means thestandard C45 technique developed by Quinlan [57] is usedto build the trees from clusters whereby C45 builds tree foreach cluster More details and the pseudocode of C45 canbeen found elsewhere [58 59]

34 Testing Phase This phase is implemented by the analysisagent and is executed in two stages to test the traffic datanetwork In the first stage the closest centroid of testinginstance is chosen (the pseudocode of this stage is shown inPseudocodes 3) In the second stage the subtree correspond-ing to the centroid chosen in the first stage is implemented inorder to test the instance and identify the appropriate class forthis instanceThe pseudocode of the second stage is shown inPseudocodes 4

4 Experimental Setup and Analysis

We used the benchmark KDD Cup 1999 [60] to evaluatethe MAS-IDS performance In most of the previous worksin this field the authors used cross-validation such as 10-fold for evaluation Cross-validation was based on using thesame classes of training data without adding new classesin the testing stage Thus these works could achieve highperformance in terms of accuracy and detection rate Onthe other hand the strength of IDS stems from its ability todetect unknown attacks (new attacks) The KDD Cup 1999dataset consists of two datasets 10 KDDCUP dataset (usedfor training) and Corrected dataset (employed in testing)More details about KDD Cup 1999 can be found in extantliterature [61] Among the available performance measures

The Scientific World Journal 7

accuracy (Acc) detection rate (DR) and false alarm rate(FAR) are most popular when aiming to evaluate the MAS-IDS performance

Acc = TP + TNTP + TN + FP + FN

DR =TP

TP + FN

FAR =FP

TN + FP

(5)

The computers used to implement the experiments areequipped with Core-i7 340GHz with 8 core CPUs and6GB RAM The OS is Windows 7 professional 64 bitsThe experiment was conducted in JADE platform and wasimplemented using JAVA programming

Table 1 shows the details of datasets used to evaluate theMAS-IDS performance along with the conventional method(hybrid standard 119870-means with C45) and other techniquesIt should be noted that training datasets (trainDS1 trainDS2trainDS3 and trainDS4) were generated randomly from 10KDDCUP dataset while testing datasets (testDS1 testDS2testDS3 and testDS4) were generated randomly from Cor-rected dataset

The preprocessing for the symbolic attributes is achievedThe three symbolic attributes are protocol service and flagthat convert to numeric values such as protocol attributeThe three values tcp udp and icmp are converted to 1 2and 3 respectively and the same approach is adopted for theremaining attributes

In this study three experiments were carried out Inthe first experiment the best value of the threshold wascomputed while the MAS-IDS performance was evaluatedin the second experiment by comparing the results yieldedby MAS-IDS with those obtained through the conventionalmethod and other techniques available in Weka and MatlabIn the third experiment we compared the processing timerequired by MAS-IDS with that of hybrid modified119870-meanswith C45 in nonagent environment

41 Identifying the Best Threshold for Modified119870-Means Themodified119870-means requires a predetermined threshold valueto select the initial centroids of clusters In this experiment alltraining datasets in Table 1 are used with testDS1 to computethe average accuracy for different values (1000ndash10000) Thevalue that yields the highest accuracy is thus chosen as thethreshold for modified 119870-means As can be seen in Figure 2the threshold value is 4000 as it results in an average ofaccuracy of 090155 We used all the training datasets withonly one testing dataset to choose the threshold value becausethe modified 119870-means approach is applied only on thetraining dataset to construct the clusters In all subsequentexperiments the chosen threshold (4000) is employed withhybrid modified119870-means and C45

42 MAS-IDS Performance In order to compare MAS-IDSwith the hybrid standard 119870-means and C45 [30] the bestvalue of 119896 for 119870-means is identified Typically 119870-means is

Table 1 The details of evaluation datasets

Dataset Normal DoS Probe R2L U2R TotaltrainDS1 900 1000 300 500 300 3000trainDS2 1100 1300 300 800 500 4000trainDS3 1500 1800 400 1000 300 5000trainDS4 1800 1800 500 1100 800 6000testDS1 5000 3000 700 900 400 10000testDS2 10000 7000 1000 1500 500 20000testDS3 15000 10000 1500 2500 1000 30000testDS4 20000 14000 1500 3000 1500 40000

088408860888

0890892089408960898

0909020904

0 2000 4000 6000 8000 10000

Accu

racy

Threshold

Figure 2 Computing the best threshold value for modified 119870-means

0010203040506070809

1

0 10 20 30 40 50 60 70 80 90 100 110

Perfo

rman

ce

k

AccuracyDRFAR

Figure 3 The performance of hybrid standard 119870-means and C45

run independently for different values of 119896 and the partitionthat appears the most meaningful to the domain expertis selected [62] Figure 3 shows the performance of hybridstandard 119870-means with C45 for different 119896 values (119896 =

10 20 30 100) The best 119896 value is equal to 10 because ityields the highest accuracy (9067) and detection rate (8480)As can be seen only the false alarm rate percentage (346) isnot the most optimal as 21 is achieved when 119896 = 100 Thusin all subsequent experiments we adopt 119896 = 10 as the bestnumber of clusters

8 The Scientific World Journal

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 4 ROC curve for testDS1

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 5 ROC curve for testDS2

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 6 ROC curve for testDS3

The ROC curves in Figures 4 5 6 and 7 show theproposed method performance in comparison with hybrid119870-means with C45 [30]

According to the ROC curves for the proposed hybridmodified119870-means with C45 in MAS-IDS it achieved better

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 7 ROC curve for testDS4

Table 2 The accuracy results of the MAS-IDS versus hybrid 119870-means and C45 [30]

Training dataset Testing dataset MAS-IDS Hybrid 119870-meansand C45

trainDS1

testDS1 09031 08993testDS2 091745 0916021testDS3 09126 0909633testDS4 09147 0911625

trainDS2

testDS1 08874 08742testDS2 09107 09008testDS3 0904367 0894967testDS4 090545 089475

trainDS3

testDS1 09058 08932testDS2 092225 09135testDS3 091667 09049testDS4 0916125 090805

trainDS4

testDS1 09099 08963testDS2 09215 090815testDS3 09155 09016testDS4 0918025 090625

119901 value 000000028

results in comparison with the conventional method The 119905-test shows that theMAS-IDS significantly improved accuracywith 119901 value lt 005 (000000028) Therefore the MAS-IDSwas tested by computing the classification results pertainingto each training dataset using all testing datasets presentedin Table 1 Table 2 shows the comparison accuracy betweenMAS-IDS and hybrid 119870-means with C45 Table 3 showsthe average results of the MAS-IDS evaluation along withthe comparison with the conventional method and othermethods fromWeka and Matlab

As can be seen from Table 3 the MAS-IDS approachachieves higher accuracy and detection rate as well as119865-measure However the false alarm rate precision andspecificity are not superior to those achieved by the othermethods especially the Decision Table which produces

The Scientific World Journal 9

Table 3 Comparison of the MAS-IDS performance with other methods using different measures

Method Accuracy DR FAR Precision Specificity 119865-measureMAS-IDS 09113 08526 00299 09665 09701 09056Hybrid 119870-means and C45 (2012) 09021 08394 00353 09600 09647 08954Bayes Net 09017 08177 00142 09829 09858 08926Naıve Bayes 08150 07727 01427 08578 08573 08076SMO 08805 07785 00174 09781 09826 08664IBk 08886 07962 00190 09766 09810 08771J48 08513 07298 00273 09638 09727 08299NBTree 09007 08096 00081 09900 09919 08903Decision Table 08306 06631 00020 09970 09980 07956JRip 08377 06983 00229 09682 09771 08111LibSVM 07964 08120 02191 08169 07809 08068

0010203040506070809

1

Perfo

rman

ce

AccuracyDRFAR

MA

S-ID

S

Hyb

rid K

-mea

ns

and

C45

Baye

s Net

Naiuml

ve B

ayes

SMO

IBk

J48

NBT

ree

Dec

ision

Tab

le

JRip

LibS

VM

Figure 8 Comparison performance of MAS-IDS with other meth-ods

the best ratios In state of the art methods IDS accuracy isusually measured due to the equivalence between the errorand correct rates Thus when comparing various methodswe adopt accuracy as the best measure On this basis theperformance of our MAS-IDS is superior to other methodsas shown in Table 3 More specifically the average MAS-IDS accuracy computed by using all testing datasets andall training datasets is 09113 which is greater than thoseachieved by other methods Figure 8 shows the performanceof all methods using data given in Table 3

43 MS-IDS Processing Time The last experiment demon-strates the strength of the MAS-IDS in improving the dataclassification processing time by using a multiagent systemIn this experiment five of the previously specified computerswere used In addition we used the forth training dataset(trainDS4) fromTable 1with four new large testing datasets toevaluate the strength ofMAS-IDS in processing large datasetsin less time Table 4 shows the characteristics of the newtesting datasets

To show the ability of the MAS-IDS to reduce theprocessing time the MAS-IDS approach is compared with

Table 4 Characteristics of new testing datasets used to evaluate theMAS-IDS processing time

Dataset Normal DoS Probe R2L U2R TotalnewTestDS1 35000 35000 10000 10000 10000 100000newTestDS2 70000 70000 20000 25000 15000 200000newTestDS3 100000 100000 30000 50000 20000 300000newTestDS4 150000 150000 30000 50000 20000 400000

nonagents hybrid modified 119870-means and C45 Here MAS-IDS is implemented every time a new computer is added Inother words MAS-IDS initially runs on one computer andwhen a second computer is added it starts running on bothand so on until all five computers are used Table 5 shows theprocessing time of this experimentThemaximumnumber ofagents that can be implemented with each computer is eightbecause each computer has eight core CPUs and each corecan run in parallel only one agent at the time The trainingtime of this experiment is 8814 s It should be noted thatthe coordinator agent is running on the first computer of thesystem environment

The results presented in Table 5 are based on the numberof computers and the number of agents The results in theupper left corner of Table 5 pertain to the case of using onecomputer with one agent Thus this is the worst case andshould be compared with nonagents hybrid modified 119870-means with C45 On the other hand the results in the lowerright corner of Table 4 represent the best case of MAS-IDS(maximum number of computers and agents) Furthermoreas can be seen from the data when using two computers dueto the cost of data transfer through the network which willincrease the processing time no improvements are achievedby MAS-IDS relative to other approaches However thisproblem is mitigated by introduction of additional comput-ers Nonetheless theMAS-IDS processing timewhen appliedto a large dataset such as newTestDS4 is inadequate becausethe dataset subsets are still large and require long time tobe transferred to other computers through the network Thisproblem is eliminated when a large number of computersare employed due to dividing the dataset into smaller datasubsets Finally theMAS-IDS processing time decreases with

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

The Scientific World Journal 7

accuracy (Acc) detection rate (DR) and false alarm rate(FAR) are most popular when aiming to evaluate the MAS-IDS performance

Acc = TP + TNTP + TN + FP + FN

DR =TP

TP + FN

FAR =FP

TN + FP

(5)

The computers used to implement the experiments areequipped with Core-i7 340GHz with 8 core CPUs and6GB RAM The OS is Windows 7 professional 64 bitsThe experiment was conducted in JADE platform and wasimplemented using JAVA programming

Table 1 shows the details of datasets used to evaluate theMAS-IDS performance along with the conventional method(hybrid standard 119870-means with C45) and other techniquesIt should be noted that training datasets (trainDS1 trainDS2trainDS3 and trainDS4) were generated randomly from 10KDDCUP dataset while testing datasets (testDS1 testDS2testDS3 and testDS4) were generated randomly from Cor-rected dataset

The preprocessing for the symbolic attributes is achievedThe three symbolic attributes are protocol service and flagthat convert to numeric values such as protocol attributeThe three values tcp udp and icmp are converted to 1 2and 3 respectively and the same approach is adopted for theremaining attributes

In this study three experiments were carried out Inthe first experiment the best value of the threshold wascomputed while the MAS-IDS performance was evaluatedin the second experiment by comparing the results yieldedby MAS-IDS with those obtained through the conventionalmethod and other techniques available in Weka and MatlabIn the third experiment we compared the processing timerequired by MAS-IDS with that of hybrid modified119870-meanswith C45 in nonagent environment

41 Identifying the Best Threshold for Modified119870-Means Themodified119870-means requires a predetermined threshold valueto select the initial centroids of clusters In this experiment alltraining datasets in Table 1 are used with testDS1 to computethe average accuracy for different values (1000ndash10000) Thevalue that yields the highest accuracy is thus chosen as thethreshold for modified 119870-means As can be seen in Figure 2the threshold value is 4000 as it results in an average ofaccuracy of 090155 We used all the training datasets withonly one testing dataset to choose the threshold value becausethe modified 119870-means approach is applied only on thetraining dataset to construct the clusters In all subsequentexperiments the chosen threshold (4000) is employed withhybrid modified119870-means and C45

42 MAS-IDS Performance In order to compare MAS-IDSwith the hybrid standard 119870-means and C45 [30] the bestvalue of 119896 for 119870-means is identified Typically 119870-means is

Table 1 The details of evaluation datasets

Dataset Normal DoS Probe R2L U2R TotaltrainDS1 900 1000 300 500 300 3000trainDS2 1100 1300 300 800 500 4000trainDS3 1500 1800 400 1000 300 5000trainDS4 1800 1800 500 1100 800 6000testDS1 5000 3000 700 900 400 10000testDS2 10000 7000 1000 1500 500 20000testDS3 15000 10000 1500 2500 1000 30000testDS4 20000 14000 1500 3000 1500 40000

088408860888

0890892089408960898

0909020904

0 2000 4000 6000 8000 10000

Accu

racy

Threshold

Figure 2 Computing the best threshold value for modified 119870-means

0010203040506070809

1

0 10 20 30 40 50 60 70 80 90 100 110

Perfo

rman

ce

k

AccuracyDRFAR

Figure 3 The performance of hybrid standard 119870-means and C45

run independently for different values of 119896 and the partitionthat appears the most meaningful to the domain expertis selected [62] Figure 3 shows the performance of hybridstandard 119870-means with C45 for different 119896 values (119896 =

10 20 30 100) The best 119896 value is equal to 10 because ityields the highest accuracy (9067) and detection rate (8480)As can be seen only the false alarm rate percentage (346) isnot the most optimal as 21 is achieved when 119896 = 100 Thusin all subsequent experiments we adopt 119896 = 10 as the bestnumber of clusters

8 The Scientific World Journal

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 4 ROC curve for testDS1

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 5 ROC curve for testDS2

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 6 ROC curve for testDS3

The ROC curves in Figures 4 5 6 and 7 show theproposed method performance in comparison with hybrid119870-means with C45 [30]

According to the ROC curves for the proposed hybridmodified119870-means with C45 in MAS-IDS it achieved better

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 7 ROC curve for testDS4

Table 2 The accuracy results of the MAS-IDS versus hybrid 119870-means and C45 [30]

Training dataset Testing dataset MAS-IDS Hybrid 119870-meansand C45

trainDS1

testDS1 09031 08993testDS2 091745 0916021testDS3 09126 0909633testDS4 09147 0911625

trainDS2

testDS1 08874 08742testDS2 09107 09008testDS3 0904367 0894967testDS4 090545 089475

trainDS3

testDS1 09058 08932testDS2 092225 09135testDS3 091667 09049testDS4 0916125 090805

trainDS4

testDS1 09099 08963testDS2 09215 090815testDS3 09155 09016testDS4 0918025 090625

119901 value 000000028

results in comparison with the conventional method The 119905-test shows that theMAS-IDS significantly improved accuracywith 119901 value lt 005 (000000028) Therefore the MAS-IDSwas tested by computing the classification results pertainingto each training dataset using all testing datasets presentedin Table 1 Table 2 shows the comparison accuracy betweenMAS-IDS and hybrid 119870-means with C45 Table 3 showsthe average results of the MAS-IDS evaluation along withthe comparison with the conventional method and othermethods fromWeka and Matlab

As can be seen from Table 3 the MAS-IDS approachachieves higher accuracy and detection rate as well as119865-measure However the false alarm rate precision andspecificity are not superior to those achieved by the othermethods especially the Decision Table which produces

The Scientific World Journal 9

Table 3 Comparison of the MAS-IDS performance with other methods using different measures

Method Accuracy DR FAR Precision Specificity 119865-measureMAS-IDS 09113 08526 00299 09665 09701 09056Hybrid 119870-means and C45 (2012) 09021 08394 00353 09600 09647 08954Bayes Net 09017 08177 00142 09829 09858 08926Naıve Bayes 08150 07727 01427 08578 08573 08076SMO 08805 07785 00174 09781 09826 08664IBk 08886 07962 00190 09766 09810 08771J48 08513 07298 00273 09638 09727 08299NBTree 09007 08096 00081 09900 09919 08903Decision Table 08306 06631 00020 09970 09980 07956JRip 08377 06983 00229 09682 09771 08111LibSVM 07964 08120 02191 08169 07809 08068

0010203040506070809

1

Perfo

rman

ce

AccuracyDRFAR

MA

S-ID

S

Hyb

rid K

-mea

ns

and

C45

Baye

s Net

Naiuml

ve B

ayes

SMO

IBk

J48

NBT

ree

Dec

ision

Tab

le

JRip

LibS

VM

Figure 8 Comparison performance of MAS-IDS with other meth-ods

the best ratios In state of the art methods IDS accuracy isusually measured due to the equivalence between the errorand correct rates Thus when comparing various methodswe adopt accuracy as the best measure On this basis theperformance of our MAS-IDS is superior to other methodsas shown in Table 3 More specifically the average MAS-IDS accuracy computed by using all testing datasets andall training datasets is 09113 which is greater than thoseachieved by other methods Figure 8 shows the performanceof all methods using data given in Table 3

43 MS-IDS Processing Time The last experiment demon-strates the strength of the MAS-IDS in improving the dataclassification processing time by using a multiagent systemIn this experiment five of the previously specified computerswere used In addition we used the forth training dataset(trainDS4) fromTable 1with four new large testing datasets toevaluate the strength ofMAS-IDS in processing large datasetsin less time Table 4 shows the characteristics of the newtesting datasets

To show the ability of the MAS-IDS to reduce theprocessing time the MAS-IDS approach is compared with

Table 4 Characteristics of new testing datasets used to evaluate theMAS-IDS processing time

Dataset Normal DoS Probe R2L U2R TotalnewTestDS1 35000 35000 10000 10000 10000 100000newTestDS2 70000 70000 20000 25000 15000 200000newTestDS3 100000 100000 30000 50000 20000 300000newTestDS4 150000 150000 30000 50000 20000 400000

nonagents hybrid modified 119870-means and C45 Here MAS-IDS is implemented every time a new computer is added Inother words MAS-IDS initially runs on one computer andwhen a second computer is added it starts running on bothand so on until all five computers are used Table 5 shows theprocessing time of this experimentThemaximumnumber ofagents that can be implemented with each computer is eightbecause each computer has eight core CPUs and each corecan run in parallel only one agent at the time The trainingtime of this experiment is 8814 s It should be noted thatthe coordinator agent is running on the first computer of thesystem environment

The results presented in Table 5 are based on the numberof computers and the number of agents The results in theupper left corner of Table 5 pertain to the case of using onecomputer with one agent Thus this is the worst case andshould be compared with nonagents hybrid modified 119870-means with C45 On the other hand the results in the lowerright corner of Table 4 represent the best case of MAS-IDS(maximum number of computers and agents) Furthermoreas can be seen from the data when using two computers dueto the cost of data transfer through the network which willincrease the processing time no improvements are achievedby MAS-IDS relative to other approaches However thisproblem is mitigated by introduction of additional comput-ers Nonetheless theMAS-IDS processing timewhen appliedto a large dataset such as newTestDS4 is inadequate becausethe dataset subsets are still large and require long time tobe transferred to other computers through the network Thisproblem is eliminated when a large number of computersare employed due to dividing the dataset into smaller datasubsets Finally theMAS-IDS processing time decreases with

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

8 The Scientific World Journal

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 4 ROC curve for testDS1

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1

True

pos

itive

rate

False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

Figure 5 ROC curve for testDS2

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 6 ROC curve for testDS3

The ROC curves in Figures 4 5 6 and 7 show theproposed method performance in comparison with hybrid119870-means with C45 [30]

According to the ROC curves for the proposed hybridmodified119870-means with C45 in MAS-IDS it achieved better

0010203040506070809

1

0 01 02 03 04 05 06 07 08 09 1False positive rate

Hybrid modified K-means and C45Hybrid K-means and C45

True

pos

itive

rate

Figure 7 ROC curve for testDS4

Table 2 The accuracy results of the MAS-IDS versus hybrid 119870-means and C45 [30]

Training dataset Testing dataset MAS-IDS Hybrid 119870-meansand C45

trainDS1

testDS1 09031 08993testDS2 091745 0916021testDS3 09126 0909633testDS4 09147 0911625

trainDS2

testDS1 08874 08742testDS2 09107 09008testDS3 0904367 0894967testDS4 090545 089475

trainDS3

testDS1 09058 08932testDS2 092225 09135testDS3 091667 09049testDS4 0916125 090805

trainDS4

testDS1 09099 08963testDS2 09215 090815testDS3 09155 09016testDS4 0918025 090625

119901 value 000000028

results in comparison with the conventional method The 119905-test shows that theMAS-IDS significantly improved accuracywith 119901 value lt 005 (000000028) Therefore the MAS-IDSwas tested by computing the classification results pertainingto each training dataset using all testing datasets presentedin Table 1 Table 2 shows the comparison accuracy betweenMAS-IDS and hybrid 119870-means with C45 Table 3 showsthe average results of the MAS-IDS evaluation along withthe comparison with the conventional method and othermethods fromWeka and Matlab

As can be seen from Table 3 the MAS-IDS approachachieves higher accuracy and detection rate as well as119865-measure However the false alarm rate precision andspecificity are not superior to those achieved by the othermethods especially the Decision Table which produces

The Scientific World Journal 9

Table 3 Comparison of the MAS-IDS performance with other methods using different measures

Method Accuracy DR FAR Precision Specificity 119865-measureMAS-IDS 09113 08526 00299 09665 09701 09056Hybrid 119870-means and C45 (2012) 09021 08394 00353 09600 09647 08954Bayes Net 09017 08177 00142 09829 09858 08926Naıve Bayes 08150 07727 01427 08578 08573 08076SMO 08805 07785 00174 09781 09826 08664IBk 08886 07962 00190 09766 09810 08771J48 08513 07298 00273 09638 09727 08299NBTree 09007 08096 00081 09900 09919 08903Decision Table 08306 06631 00020 09970 09980 07956JRip 08377 06983 00229 09682 09771 08111LibSVM 07964 08120 02191 08169 07809 08068

0010203040506070809

1

Perfo

rman

ce

AccuracyDRFAR

MA

S-ID

S

Hyb

rid K

-mea

ns

and

C45

Baye

s Net

Naiuml

ve B

ayes

SMO

IBk

J48

NBT

ree

Dec

ision

Tab

le

JRip

LibS

VM

Figure 8 Comparison performance of MAS-IDS with other meth-ods

the best ratios In state of the art methods IDS accuracy isusually measured due to the equivalence between the errorand correct rates Thus when comparing various methodswe adopt accuracy as the best measure On this basis theperformance of our MAS-IDS is superior to other methodsas shown in Table 3 More specifically the average MAS-IDS accuracy computed by using all testing datasets andall training datasets is 09113 which is greater than thoseachieved by other methods Figure 8 shows the performanceof all methods using data given in Table 3

43 MS-IDS Processing Time The last experiment demon-strates the strength of the MAS-IDS in improving the dataclassification processing time by using a multiagent systemIn this experiment five of the previously specified computerswere used In addition we used the forth training dataset(trainDS4) fromTable 1with four new large testing datasets toevaluate the strength ofMAS-IDS in processing large datasetsin less time Table 4 shows the characteristics of the newtesting datasets

To show the ability of the MAS-IDS to reduce theprocessing time the MAS-IDS approach is compared with

Table 4 Characteristics of new testing datasets used to evaluate theMAS-IDS processing time

Dataset Normal DoS Probe R2L U2R TotalnewTestDS1 35000 35000 10000 10000 10000 100000newTestDS2 70000 70000 20000 25000 15000 200000newTestDS3 100000 100000 30000 50000 20000 300000newTestDS4 150000 150000 30000 50000 20000 400000

nonagents hybrid modified 119870-means and C45 Here MAS-IDS is implemented every time a new computer is added Inother words MAS-IDS initially runs on one computer andwhen a second computer is added it starts running on bothand so on until all five computers are used Table 5 shows theprocessing time of this experimentThemaximumnumber ofagents that can be implemented with each computer is eightbecause each computer has eight core CPUs and each corecan run in parallel only one agent at the time The trainingtime of this experiment is 8814 s It should be noted thatthe coordinator agent is running on the first computer of thesystem environment

The results presented in Table 5 are based on the numberof computers and the number of agents The results in theupper left corner of Table 5 pertain to the case of using onecomputer with one agent Thus this is the worst case andshould be compared with nonagents hybrid modified 119870-means with C45 On the other hand the results in the lowerright corner of Table 4 represent the best case of MAS-IDS(maximum number of computers and agents) Furthermoreas can be seen from the data when using two computers dueto the cost of data transfer through the network which willincrease the processing time no improvements are achievedby MAS-IDS relative to other approaches However thisproblem is mitigated by introduction of additional comput-ers Nonetheless theMAS-IDS processing timewhen appliedto a large dataset such as newTestDS4 is inadequate becausethe dataset subsets are still large and require long time tobe transferred to other computers through the network Thisproblem is eliminated when a large number of computersare employed due to dividing the dataset into smaller datasubsets Finally theMAS-IDS processing time decreases with

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

The Scientific World Journal 9

Table 3 Comparison of the MAS-IDS performance with other methods using different measures

Method Accuracy DR FAR Precision Specificity 119865-measureMAS-IDS 09113 08526 00299 09665 09701 09056Hybrid 119870-means and C45 (2012) 09021 08394 00353 09600 09647 08954Bayes Net 09017 08177 00142 09829 09858 08926Naıve Bayes 08150 07727 01427 08578 08573 08076SMO 08805 07785 00174 09781 09826 08664IBk 08886 07962 00190 09766 09810 08771J48 08513 07298 00273 09638 09727 08299NBTree 09007 08096 00081 09900 09919 08903Decision Table 08306 06631 00020 09970 09980 07956JRip 08377 06983 00229 09682 09771 08111LibSVM 07964 08120 02191 08169 07809 08068

0010203040506070809

1

Perfo

rman

ce

AccuracyDRFAR

MA

S-ID

S

Hyb

rid K

-mea

ns

and

C45

Baye

s Net

Naiuml

ve B

ayes

SMO

IBk

J48

NBT

ree

Dec

ision

Tab

le

JRip

LibS

VM

Figure 8 Comparison performance of MAS-IDS with other meth-ods

the best ratios In state of the art methods IDS accuracy isusually measured due to the equivalence between the errorand correct rates Thus when comparing various methodswe adopt accuracy as the best measure On this basis theperformance of our MAS-IDS is superior to other methodsas shown in Table 3 More specifically the average MAS-IDS accuracy computed by using all testing datasets andall training datasets is 09113 which is greater than thoseachieved by other methods Figure 8 shows the performanceof all methods using data given in Table 3

43 MS-IDS Processing Time The last experiment demon-strates the strength of the MAS-IDS in improving the dataclassification processing time by using a multiagent systemIn this experiment five of the previously specified computerswere used In addition we used the forth training dataset(trainDS4) fromTable 1with four new large testing datasets toevaluate the strength ofMAS-IDS in processing large datasetsin less time Table 4 shows the characteristics of the newtesting datasets

To show the ability of the MAS-IDS to reduce theprocessing time the MAS-IDS approach is compared with

Table 4 Characteristics of new testing datasets used to evaluate theMAS-IDS processing time

Dataset Normal DoS Probe R2L U2R TotalnewTestDS1 35000 35000 10000 10000 10000 100000newTestDS2 70000 70000 20000 25000 15000 200000newTestDS3 100000 100000 30000 50000 20000 300000newTestDS4 150000 150000 30000 50000 20000 400000

nonagents hybrid modified 119870-means and C45 Here MAS-IDS is implemented every time a new computer is added Inother words MAS-IDS initially runs on one computer andwhen a second computer is added it starts running on bothand so on until all five computers are used Table 5 shows theprocessing time of this experimentThemaximumnumber ofagents that can be implemented with each computer is eightbecause each computer has eight core CPUs and each corecan run in parallel only one agent at the time The trainingtime of this experiment is 8814 s It should be noted thatthe coordinator agent is running on the first computer of thesystem environment

The results presented in Table 5 are based on the numberof computers and the number of agents The results in theupper left corner of Table 5 pertain to the case of using onecomputer with one agent Thus this is the worst case andshould be compared with nonagents hybrid modified 119870-means with C45 On the other hand the results in the lowerright corner of Table 4 represent the best case of MAS-IDS(maximum number of computers and agents) Furthermoreas can be seen from the data when using two computers dueto the cost of data transfer through the network which willincrease the processing time no improvements are achievedby MAS-IDS relative to other approaches However thisproblem is mitigated by introduction of additional comput-ers Nonetheless theMAS-IDS processing timewhen appliedto a large dataset such as newTestDS4 is inadequate becausethe dataset subsets are still large and require long time tobe transferred to other computers through the network Thisproblem is eliminated when a large number of computersare employed due to dividing the dataset into smaller datasubsets Finally theMAS-IDS processing time decreases with

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

10 The Scientific World Journal

Table 5 Comparison of processing time required by MAS-IDS and other nonagents hybrid modified 119870-means and C45

Number of agents Testing dataset Number of computers (processing time in seconds)1 2 3 4 5

1

newTestDS1 15349 1357 13219 7841 6972newTestDS2 31607 26554 26228 15309 13424newTestDS3 45893 41587 35910 23462 20162newTestDS4 64172 76688 72101 44509 37723

2

newTestDS1 8852 8304 7102 5843 4630newTestDS2 17339 17883 16310 13160 10110newTestDS3 26184 27500 22502 16931 15313newTestDS4 40226 64332 49853 37263 36832

3

newTestDS1 8121 8242 6940 534 4575newTestDS2 13472 12767 11743 9957 8374newTestDS3 20735 22689 20186 15788 14421newTestDS4 32604 52819 48814 36589 35536

4

newTestDS1 6699 6818 5891 4631 3854newTestDS2 11776 14209 11104 9375 8116newTestDS3 18613 20927 17613 14490 13413newTestDS4 29399 51852 46811 35564 3490

5

newTestDS1 6146 6659 5579 4555 3715newTestDS2 11534 13685 10580 9198 7924newTestDS3 17568 20662 17497 1444 1362newTestDS4 29349 51527 43956 34521 32349

6

newTestDS1 668 6601 5393 4465 3224newTestDS2 11318 12922 10531 8980 7567newTestDS3 17419 20203 16938 15788 12656newTestDS4 28660 5067 41475 32390 31989

7

newTestDS1 5871 6443 5272 4258 3150newTestDS2 11134 12787 10494 8680 7685newTestDS3 17223 20683 16556 15178 12362newTestDS4 28438 45539 39859 30447 3020

8

newTestDS1 5481 6204 4289 484 389newTestDS2 11011 1324 9913 8369 7399newTestDS3 17150 19713 16483 1481 11181newTestDS4 27303 4039 38156 30196 29851

addition of each new computer as the number of agentsalso increasesThe network specifications such as bandwidthand speed play an important role in reducing the MAS-IDSprocessing time Figures 9 and 10 show the effect of increasingnumber of agents and computers on theMAS-IDS processingtime respectively In Figure 9 the number of computers usedwith this experiment is five computers while the number ofagents used in experiment of Figure 10 is only one agent asshown in Table 5

The best case ofMAS-IDS processing time in comparisonwith the nonagent hybrid modified 119870-means and C45 isshown in Figure 11

Finally since the proposed system uses each core of CPUsto run one of the analysis agent then the cost of systemresources will be in positive correlation with the increase

of the number of agents At the same time whenever thenumber of analysis agent is increasing then the size of subsetdata analysis will be very small and thus the analysis processwill need only one or two seconds of processing time toachieve it Consequently the proposed system makes thebalance situation between the physical components (numbercores of CPUs) with the number of agents which can becreated as (2) Figure 12 compares the average cost of systemresources (consumption of CPUs) when MAS-IDS uses 5computers with 8 analysis agents at each computer (total 40agents) and another time when it uses one analysis agent ateach computer (total 5 agents) on the same datasets

From Figure 12 the processing time of the highest peakof utilization of CPU when used one agent (6 sec) is greaterthan the processing time of the highest peak of utilization

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

The Scientific World Journal 11

05

10152025303540

0 1 2 3 4 5 6 7 8 9

Proc

essin

g tim

e (s)

Number of agents

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 9 Time required to process the testing datasets in relationto the number of agents

0102030405060708090

0 1 2 3 4 5 6

Proc

essin

g tim

e (s)

Number of computers

newTestDS1newTestDS2

newTestDS3newTestDS4

Figure 10 Time required to process the testing datasets in relationto the number of computers

of CPU when used 8 agents that consume only one sec Asa consequence whenever the number of agents is small theprocessing time will be long with low cost of system whilewhenever the number of agents is increasing the processingtime will be short with high cost of systemThe cost of systemresources with respect to memory does not exceed 10 in allexperiments

This experiment demonstrates that the MAS-IDS has agreat potential to reduce the IDS processing time relative tomethods that do not employ agentsThepercentage reductionin the processing time for MAS-IDS can reach up to 70relative to other approaches In this experiment we used fivecomputers only Clearly with a greater number of computersa higher percentage reduction in the processing time couldbe achieved

5 Conclusion

In this work we have proposed hybrid modified 119870-meanswith C45 for IDS in MAS environment Hybrid modified119870-means with C45 is used to improve the classification accu-racy while MAS is used to reduce the processing time of IDS

0

10

20

30

40

50

60

70

newTestDS1 newTestDS2 newTestDS3 newTestDS4

Proc

essin

g tim

e (s)

Nonagents hybrid modified K-means and C45Best case of MAS-IDS

Figure 11 Comparison ofMAS-IDS processing timewith that of thenonagents hybrid modified 119870-means and C45

0102030405060708090

100

0 2 4 6 8 10 12 14 16 18 20

Util

izat

ion

of C

PU (

)

Processing time (s)

8 agents1 agent

Figure 12 The cost of system resources (CPUs)

Themodification of119870-means is based on choosing the initialcentroids of clusters that represent all cases of the datasetallowing the number of clusters 119896 to be determined Threetypes of agentsmdashcoordinator analysis and communicationagentmdashare used KDD Cup 1999 dataset is employed whileJADE platform with five computers is used to implement theproposed method

MAS-IDS demonstrated that multiagent system has sig-nificant potential for reducing the IDS processing time Thepercentage reduction in processing time of up to 70 wasachieved by MAS-IDS However the hybrid modified 119870-means with C45 approach performed better than the hybrid119870-means and C45 as well as other techniques availablein Weka and Matlab The 119905-test of accuracy that comparedMAS-IDS with the conventional 119870-means and C45 methodconfirmed that the former was superior (with 119901 value of000000028) This indicates that the MAS-IDS has highpotential to improve the performance of intrusion detectionsystems

In the future work we will attempt to improve the IDSaccuracy further by combining the proposed method withother techniques We will also try to implement our methodwith other datasets and a real data network to make system

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

12 The Scientific World Journal

more suitable for real environment We will use the newattacks that are detected by system as unknown attacks toretrain the proposed method as a feedback In addition weexpect to reduce the IDSprocessing timewhenusing a greaternumber of computers

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by National University of Malaysia(UKM) Grant no AP2013-007

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] N Sengupta J Sen J Sil and M Saha ldquoDesigning of online intrusion detection system using rough set theory and Q-learning algorithmrdquoNeurocomputing vol 111 pp 161ndash168 2013

[3] L Koc T A Mazzuchi and S Sarkani ldquoA network intrusiondetection system based on a Hidden Naıve Bayes multiclassclassifierrdquo Expert Systems with Applications vol 39 no 18 pp13492ndash13500 2012

[4] M Uddin A A Rehman N Uddin J Memon R Alsaqourand S Kazi ldquoSignature-based multi-layer distributed intrusiondetection system using mobile agentsrdquo International Journal ofNetwork Security vol 15 no 2 pp 97ndash105 2013

[5] C N Modi D R Patel A Patel and M Rajarajan ldquoIntegratingsignature apriori based network intrusion detection system(NIDS) in cloud computingrdquo Procedia Technology vol 6 pp905ndash912 2012

[6] H Mohamed L Adil T Saida and et al ldquoA collaborativeintrusion detection and prevention system in cloud computingrdquoin Proceedings of the IEEE (AFRICON rsquo13) pp 1ndash5 IEEESeptember 2013

[7] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[8] M Chowdhary S Suri and M Bhutani ldquoComparative study ofintrusion detection systemrdquo International Journal of ComputerSciences and Engineering vol 2 no 4 pp 197ndash200 2014

[9] I Corona G Giacinto and F Roli ldquoAdversarial attacks againstintrusion detection systems taxonomy solutions and openissuesrdquo Information Sciences vol 239 pp 201ndash225 2013

[10] S Shamshirband N B Anuar M L M Kiah and A Patel ldquoAnappraisal and design of a multi-agent system based cooperativewireless intrusion detection computational intelligence tech-niquerdquo Engineering Applications of Artificial Intelligence vol 26no 9 pp 2105ndash2127 2013

[11] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (LISA rsquo99) pp 229ndash238 1999

[12] D Barbara and S Jajodia Applications of Data Mining inComputer Security Springer 2002

[13] P Natesan P Balasubramanie and G Gowrison ldquoImprovingthe attack detection rate in network intrusion detection usingadaboost algorithmrdquo Journal of Computer Science vol 8 no 7pp 1041ndash1048 2012

[14] A Bivens C Palagiri R Smith B Szymanski and MEmbrechts ldquoNetwork-based intrusion detection using neuralnetworksrdquo in Proceedings of the Intelligent Engineering SystemsthroughArtificial Neural Networks vol 12 pp 579ndash584 Novem-ber 2002

[15] Y Li and W Jie ldquoThe method of network intrusion detectionbased on the neural network GCBP algorithmrdquo in Proceedingsof the International Conference on Computer Science and Infor-mation Processing (CSIP rsquo12) pp 1082ndash1086 IEEE August 2012

[16] J Lin T Huang and B Zhao ldquoA fast fuzzy set intrusiondetection modelrdquo in International Symposium on KnowledgeAcquisition and Modeling (KAM rsquo08) pp 601ndash605 December2008

[17] A Abraham R Jain J Thomas and S Y Han ldquoD-SCIDSdistributed soft computing intrusion detection systemrdquo Journalof Network and Computer Applications vol 30 no 1 pp 81ndash982007

[18] V V Kumari S Pamidi and A Govardhan ldquoIntegrated Bayesnetwork and hidden Markov model for host based IDSrdquoInternational Journal of Computer Applications vol 41 no 20pp 45ndash49 2012

[19] M A Hasan M Nasser B Pal and S Ahmad ldquoSupportvector machine and random forest modeling for intrusiondetection system (IDS)rdquo Journal of Intelligent Learning Systemsand Applications vol 6 no 1 pp 45ndash52 2014

[20] C Xiang P C Yong and L S Meng ldquoDesign of multiple-levelhybrid classifier for intrusion detection system using Bayesianclustering and decision treesrdquo Pattern Recognition Letters vol29 no 7 pp 918ndash924 2008

[21] M N Huhns Distributed Artificial Intelligence Elsevier 2012[22] S J Stolfo A L Prodromidis S Tselepis et al ldquoJAM java agents

for meta-learning over distributed databasesrdquo in Proceedings ofthe 3rd International Conference on Knowledge Discovery andData Mining (KDD rsquo97) pp 74ndash81 1997

[23] P Kannadiga andM Zulkernine ldquoDIDMA a distributed intru-sion detection system usingmobile agentsrdquo in Proceedings of the6th International Conference on Software Engineering ArtificialIntelligence Networking and ParallelDistributedComputing and1st ACIS International Workshop on Self-Assembling WirelessNetworks (SNPDSAWN rsquo05) pp 238ndash245 IEEE May 2005

[24] L Portnoy Intrusion Detection with Unlabeled Data UsingClustering 2000

[25] M Jianliang S Haikun and B Ling ldquoThe application onintrusion detection based on K-means cluster algorithmrdquo inProceedings of the International Forum on Information Tech-nology and Applications (IFITA rsquo09) vol 1 pp 150ndash152 IEEEChengdu China May 2009

[26] M Sabhnani and G Serpen ldquoApplication of machine learn-ing algorithms to KDD intrusion detection dataset withinmisuse detection contextrdquo in Proceedings of the InternationalConference on Machine Learning Models Technologies andApplications (MLMTA rsquo03) pp 209ndash215 June 2003

[27] G Munz S Li and G Carle ldquoTraffic anomaly detection usingk-means clusteringrdquo in Proceedings of the GIITG WorkshopMMBnet 2007

[28] V Kumar H Chauhan and D Panwar ldquoK-means clusteringapproach to analyze NSL-KDD intrusion detection datasetrdquo

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

The Scientific World Journal 13

International Journal of Soft Computing and Engineering vol 3no 4 pp 1ndash4 2013

[29] S Chawla and A Gionis ldquok-means- a unified approach toclustering and outlier detectionrdquo in Proceedings of the SIAMInternational Conference onDataMining (SDM 13) pp 189ndash197SIAM 2013

[30] A P Muniyandi R Rajeswari and R Rajaram ldquoNetworkanomaly detection by cascading K-means clustering and C45decision Tree algorithmrdquo Procedia Engineering vol 30 pp 174ndash182 2012

[31] L Xiao Z Shao and G Liu ldquoK-means algorithm based onparticle swarm optimization algorithm for anomaly intrusiondetectionrdquo inProceedings of the 6thWorldCongress on IntelligentControl and Automation (WCICA rsquo06) pp 5854ndash5858 IEEEJune 2006

[32] Z MudaW Yassin M N Sulaiman and N I Udzir ldquoIntrusiondetection based on K-Means clustering and Naıve Bayes classi-ficationrdquo in Proceedings of the 7th International Conference onInformation Technology in Asia (CITA rsquo11) pp 1ndash6 IEEE July2011

[33] H-B Wang H-L Yang Z-J Xu and Z Yuan ldquoA clusteringalgorithm use SOM and K-means in intrusion detectionrdquo inProceedings of the 1st International Conference on E-Business andE-Government (ICEE rsquo10) pp 1281ndash1284 May 2010

[34] A M Chandrasekhar and K Raghuveer ldquoIntrusion detectiontechnique by using k-means fuzzy neural network and SVMclassifiersrdquo in Proceedings of the 3rd International Conference onComputer Communication and Informatics (ICCCIrsquo 13) pp 1ndash3January 2013

[35] R Goel A Sardana and R C Joshi ldquoParallel misuse andanomaly detection modelrdquo International Journal of NetworkSecurity vol 14 no 4 pp 211ndash222 2012

[36] O Depren M Topallar E Anarim and M K Ciliz ldquoAnintelligent intrusion detection system (IDS) for anomaly andmisuse detection in computer networksrdquo Expert Systems withApplications vol 29 no 4 pp 713ndash722 2005

[37] A S A Aziz A E Hassanien S E-O Hanaf and M TolbaldquoMulti-layer hybrid machine learning techniques for anomaliesdetection and classification approachrdquo in Proceedings of the 13thInternational Conference on Hybrid Intelligent Systems (HIS rsquo13)pp 215ndash220 IEEE Gammarth Tunisia December 2013

[38] M Ektefa S Memar F Sidi and L S Affendey ldquoIntrusiondetection using data mining techniquesrdquo in Proceedings of theInternational Conference on Information Retrieval and Knowl-edgeManagement Exploring the InvisibleWorld (CAMP rsquo10) pp200ndash203 IEEE March 2010

[39] G MeeraGandhi K Appavoo and S Srivasta ldquoEffective net-work intrusion detection using classifiers decision trees anddecision rulesrdquo International Journal of Advanced Networkingand Applications vol 2 no 3 pp 686ndash692 2010

[40] H Chauhan V Kumar S Pundir and E S Pilli ldquoA comparativestudy of classification techniques for intrusion detectionrdquo inProceedings of the International Symposium on Computationaland Business Intelligence (ISCBI rsquo13) pp 40ndash43 IEEE August2013

[41] C Katar ldquoCombining multiple techniques for intrusion detec-tionrdquo International Journal of Computer Science and NetworkSecurity vol 6 no 2B pp 208ndash218 2006

[42] S R Gaddam V V Phoha and K S Balagani ldquoK-means+id3 anovelmethod for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methodsrdquo IEEE

Transactions on Knowledge and Data Engineering vol 19 no 3pp 345ndash354 2007

[43] D Dasgupta F Gonzalez K Yallapu J Gomez and R Yarram-settii ldquoCIDS an agent-based intrusion detection systemrdquo Com-puters amp Security vol 24 no 5 pp 387ndash398 2005

[44] D L Hancock and G B Lamont ldquoMulti agent system for net-work attack classification using flow-based intrusion detectionrdquoin IEEE Congress of Evolutionary Computation (CEC rsquo11) pp1535ndash1542 June 2011

[45] X Zhu Z Huang and H Zhou ldquoDesign of a multi-agentbased intelligent intrusion detection systemrdquo in Proceedings ofthe 1st International Symposium on Pervasive Computing andApplications (SPCA rsquo06) pp 290ndash295 August 2006

[46] M El Ajjouri S Benhadou and H Medromi ldquoIntelligentarchitecture based onMAS andCBR for intrusion detectionrdquo inProceedings of the 4th Edition of National Security Days (JNS4)pp 1ndash4 IEEE May 2014

[47] J Yang X Liu T Li G Liang and S Liu ldquoDistributed agentsmodel for intrusion detection based on AISrdquo Knowledge-BasedSystems vol 22 no 2 pp 115ndash119 2009

[48] J MacQueen ldquoSome methods for classification and analysis ofmultivariate observationsrdquo in Proceedings of the 5th BerkeleySymposium on Mathematical Statistics and Probability pp 281ndash297 Berkeley Calif USA 1967

[49] J M Pena J A Lozano and P Larranaga ldquoAn empiricalcomparison of four initialization methods for the K-Meansalgorithmrdquo Pattern Recognition Letters vol 20 no 10 pp 1027ndash1040 1999

[50] G H Ball and D J Hall ldquoA clustering technique for summa-rizing multivariate datardquo Behavioral Science vol 12 no 2 pp153ndash155 1967

[51] I Katsavounidis C-C J Kuo and Z Zhang ldquoNew initializationtechnique for generalized Lloyd iterationrdquo IEEE Signal Process-ing Letters vol 1 no 10 pp 144ndash146 1994

[52] M D B Al-Daoud ldquoA new algorithm for cluster initializationrdquoin Proceedings of the WECrsquo05 The 2nd World EnformatikaConference 2007

[53] D Arthur and S Vassilvitskii ldquok-means++ the advantages ofcareful seedingrdquo in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms pp 1027ndash1035 Society forIndustrial and Applied Mathematics New Orleans La USAJanuary 2007

[54] M Erisoglu N Calis and S Sakallioglu ldquoA new algorithm forinitial cluster centers in k-means algorithmrdquo Pattern Recogni-tion Letters vol 32 no 14 pp 1701ndash1705 2011

[55] L Yongzhong Y Ge X Jing et al ldquoAnomaly detection forclustering algorithm based on particle swarm optimizationrdquoJournal of Jiangsu University of Science and Technology (NaturalScience Edition) vol 23 no 1 pp 51ndash55 2009

[56] W Cong J Morris and W Xiaojun ldquoHigh performance deeppacket inspection on multi-core platformrdquo in Proceedings of the2nd IEEE International Conference on Broadband Network andMultimedia Technology (IC-BNMTrsquo 09) pp 619ndash622 October2009

[57] J R Quinlan C4 5 Programs for Machine Learning MorganKaufmann Publishers 1993

[58] S Ruggieri ldquoEfficient C45 [classification algorithm]rdquo IEEETransactions on Knowledge and Data Engineering vol 14 no 2pp 438ndash444 2002

[59] X Wu and V Kumar The Top Ten Algorithms in Data MiningCRC Press New York NY USA 2010

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

14 The Scientific World Journal

[60] KDD Cup 1999 httpkddicsuciedudatabaseskddcup99kddcup99html

[61] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in Proceedings of the 2ndIEEE Symposium on Computational Intelligence for Security andDefence Applications pp 1ndash6 IEEE July 2009

[62] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 15: Research Article Hybrid Modified -Means with C4.5 for Intrusion ...downloads.hindawi.com/journals/tswj/2015/294761.pdf · Hybrid Modified -Means with C4.5 for Intrusion Detection

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014