Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Machine Learning Algorithms for Detection of Network Intrusions
Ljiljana Trajković[email protected]
Communication Networks Laboratoryhttp://www.ensc.sfu.ca/cnl
School of Engineering ScienceSimon Fraser University, Vancouver, British Columbia
Canada
Simon Fraser UniversityBurnaby Campus
IWCSN 2018, Nanjing, ChinaOctober 22, 2018 2
Roadmap
n Zhida Lin Soroush Haerin Qingye Dingn Prerna Battan Ana Laura Gonzalesn Nabil Al Rousan
IWCSN 2018, Nanjing, China 3October 22, 2018
lhr: 535,102 nodes and 601,678 links
October 22, 2018 IWCSN 2018, Nanjing, China
http://www.caida.org/home/
4
Roadmap
n Introduction and survey of past workn Intrusion detectionn Data processingn Classification algorithms
n Experimental proceduren Performance evaluation
n Conclusionn References
IWCSN 2018, Nanjing, China 5October 22, 2018
Roadmap
n Introduction and survey of past workn Intrusion detectionn Data processingn Classification algorithms
n Experimental proceduren Performance evaluation
n Conclusionn References
IWCSN 2018, Nanjing, China 6October 22, 2018
Motivation
§ Detecting and analyzing network anomalies and intrusions are important topics in cyber security.
§ Network intrusions may be classified using machine learning algorithms such as Recurrent Neural Networks (RNNs) and Broad Learning System (BLS).
§ Classification models are trained and tested using the NSL-KDD dataset containing information about intrusion and regular network connections.
§ Performance results indicate that the BLS algorithm shows comparable performance and has shorter training time.
IWCSN 2018, Nanjing, China 7October 22, 2018
Border Gateway Protocol
n BGP’s main function is to optimally route data between Autonomous Systems (ASes)
n AS: a collection of BGP routers (peers) within a single administrative domain
n Four types of BGP messages: n open, keepalive, update, and notification
n BGP anomalies: n Slammer, Nimda, Code Red I, routing
misconfigurations
IWCSN 2018, Nanjing, China 8October 22, 2018
Machine learning
n Machine learning models classify data using a feature matrix
n Feature matrix: n rows: data points n columns: feature values
n Algorithms: n Logistic Regression, Naïve Bayes,
Support Vector Machine (SVM)n SVM defines decision boundary to geometrically lie
midway between the support vectors
IWCSN 2018, Nanjing, China 9October 22, 2018
Feature extraction: BGP messages
n Extract 37 featuresn Sample every minute during a five-day period:
n the peak day of an anomalyn two days prior and two days after the peak day
n 7,200 samples for each anomalous event:n 5,760 regular samples (non-anomalous)n 1,440 anomalous samplesn imbalanced dataset
IWCSN 2018, Nanjing, China 10October 22, 2018
BGP features
IWCSN 2018, Nanjing, China
Feature Definition Category
1 Number of announcements Volume
2 Number of withdrawals Volume
3 Number of announced NLRI prefixes Volume
4 Number of withdrawn NLRI prefixes Volume
5 Average AS-PATH length AS-path
6 Maximum AS-PATH length AS-path
7 Average unique AS-PATH length AS-path
8 Number of duplicate announcements Volume
9 Number of duplicate withdrawals Volume
10 Number of implicit withdrawals Volume
October 22, 2018 11
BGP features
IWCSN 2018, Nanjing, China
Feature Definition Category
11 Average edit distance AS-path
12 Maximum edit distance AS-path
13 Inter-arrival timeVolume
14–24Maximum edit distance = n,
where n = (7, ..., 17)
AS-path
25–33Maximum AS-path length = n,
where n = (7, ..., 15)
AS-path
34 Number of IGP packets Volume
35 Number of EGP packets Volume
36 Number of incomplete packets Volume
37 Packet size (B)Volume
October 22, 2018 12
Feature extraction: BGP messages
n BGP enables exchange of routing information between gateway routers using update messages
n Collections of BGP update message:n Réseaux IP Européens (RIPE), Routing
Information Service (RIS) projectn Route Views
n Available in multi-threaded routing toolkit (MRT) binary format
IWCSN 2018, Nanjing, China 13October 22, 2018
BGP anomalies
n Slammer:n infected Microsoft SQL servers through a small
piece of code that generated IP addresses at random
n Nimda: n exploited vulnerabilities in the Microsoft Internet
Information Services (IIS) web servers for Internet Explorer 5
n Code Red I: n attacked Microsoft IIS web servers by replicating
itself through IIS server weaknesses
IWCSN 2018, Nanjing, China 14October 22, 2018
Duration of BGP events
IWCSN 2018, Nanjing, China 15October 22, 2018
Anomaly Date Anomaly (min)
Regular (min)
Slammer January 25, 2003 869 6,331Nimda September 18-20, 2001 3,521 3,679Code Red I July 19, 2001 600 6,600
October 22, 2018 IWCSN 2018, Nanjing, China 16
Number of BGP announcements: Slammer
October 22, 2018 IWCSN 2018, Nanjing, China 17
Number of BGP announcements: Nimda
October 22, 2018 IWCSN 2018, Nanjing, China 18
Number of BGP announcements: Code Red I
October 22, 2018 IWCSN 2018, Nanjing, China 19
Number of BGP announcements: Regular
Feature selection
n Reduces redundancy among features and improves the classification accuracy
n Decision tree algorithm was used for feature selection:n one of the most successful techniques for
supervised classification learningn It handles both numerical and categorical featuresn Publicly available software tool: C5
IWCSN 2018, Nanjing, China 20October 22, 2018
Feature selection: decision tree
IWCSN 2018, Nanjing, China
Dataset Training data Selected Features
Dataset 1 Slammer + Nimda 1–21, 23–29, 34–37Dataset 2 Slammer + Code Red I 1–22, 24–29, 34–37Dataset 3 Code Red I + Nimda 1–29, 34–37
§ Either four (30, 31, 32, 33) or five (22, 30, 31, 32, 33) features are removed in the constructed trees mainly because:n features are numerical and some are used
repeatedly
October 22, 2018 21
Support Vector Machine
IWCSN 2018, Nanjing, China 22October 22, 2018
n SVM defines a separating hyperplane in order to assign the target variables into distinct categories
n It is a non-probabilistic binary classifiern Used for classification problems and in pattern
recognition applicationsn Modified version of logistic regression
Support Vector Machine
n For a given dataset x with n number of training data, SVM finds the maximum margin hyperplane separating different classes of data:! = !#, %# , !# ∈ '(, %# ∈ 1,−1 , ∀ , = 1,2, … ,/,
where !# is the p-dimensional input vector and %# is the output value (1 or -1).
n Decision vector separating two classes is given by:01 2 ! + 4 = 0,
where 01 is the optimal weighing vector and b is the bias
n For linearly separable training data, margins are defined as: 01 2 ! + 4 = 1 and 01 2 ! + 4 = −1.
IWCSN 2018, Nanjing, China 23October 22, 2018
SVM with linear kernel: correctly classified regular (circles) and anomalous (stars) data points as well as one incorrectly classified regular
(circle) data point
October 22, 2018 IWCSN 2018, Nanjing, China 24
Support Vector Machine
Support Vector Machine
n The distance between the margins: 2/∥ $% ∥n The objective function is to minimize ∥ $% ∥n The hyperplane is acquired by minimizing the
margins:
& '()*
(+( +
12 ∥ . ∥/,
with constraints 1(2 3( ≥ 1 − +(, 6 = 1,… ,9,where: C is the regularization parameter that defines the separation of two classes and the error when using a training dataset, 1( is the target value, and +( is the set of slack variables
IWCSN 2018, Nanjing, China 25October 22, 2018
Support Vector Machine: kernel trick
n Instead of calculating each mapping, the “kernel trick” is used to directly calculate the inner product in the input space
n The mapping defines feature space and generates a decision boundary for input data points
n Using the “kernel trick” reduces the complexity of the optimization problem that now only depends on the input space instead if the feature space
IWCSN 2018, Nanjing, China 26October 22, 2018
Support Vector Machine
n Instead of employing a minimization model, the problem be formulated using Lagrangian dual multiplier ! as:
max %&'(
&!& −
12%&'(
,%-'(
,!&!-.&.- /&, /- ,
subject to variables:
0 ≤ !3 ≤ 4 ∀ 6 = 1,2, … , 9 and%3'(
&!3.3 = 0.
IWCSN 2018, Nanjing, China 27October 22, 2018
SVM with the nonlinear kernel function: the three-dimensional space shows a hyperplane dividing regular (circles) and anomalous (stars) data
pointsOctober 22, 2018 IWCSN 2018, Nanjing, China 28
Support Vector Machine
Experimental procedure
IWCSN 2018, Nanjing, China 29October 22, 2018
n Step 1:
n Use 37 features or select the most relevant
features using the decision tree algorithm
n Step 2:
n Train the SVM with linear, quadratic, or cubic
kernels
n Step 3:
n Test the models using various datasets
n Step 4:
n Evaluate the SVM kernels based on accuracy and
F-Score
Training and test datasets
IWCSN 2018, Nanjing, China 30October 22, 2018
Training dataset Test dataset
Dataset 1 Slammer and Nimda Code Red IDataset 2 Nimda and Code Red I Slammer
Experimental procedure
n MATLAB 2017b Statistics and Machine Learning Toolbox
n The performance of SVM with various kernels is evaluated using combinations of datasets
n SVM performance was measured based on accuracy and F-Score
n The confusion matrix is used to evaluate performance of classification algorithms
n True positive (TP) and false negative (FN) are the number of anomalous data points that are classified as anomaly and regular, respectively
IWCSN 2018, Nanjing, China 31October 22, 2018
IWCSN 2018, Nanjing, China 32October 22, 2018
n Accuracy: n (TP+TN)/(TP+TN+FP+FN)
n F-Score signifies harmonic mean between precision and sensitivity:n 2 x (precision x sensitivity)/(precision + sensitivity)n precision: TP/(TP+FP)n sensitivity: TP/(TP+FN)
Performance measures
SVM with linear kernel
IWCSN 2018, Nanjing, China 33October 22, 2018
Linear kernel Accuracy (%) F-Score (%)Selected features Training
datasetTest RIPE BCNET Test
1-37
Dataset 1 72.76 61.34 54.21 73.60
Dataset 2 70.81 52.89 45.36 73.19
1-21, 23-29, 34-37 Dataset 1 74.71 63.26 55.39 76.29
Dataset 2 73.27 54.12 49.38 74.48
SVM with quadratic kernel
IWCSN 2018, Nanjing, China 34October 22, 2018
Quadratic kernel Accuracy (%) F-Score (%)Selected features Training
datasetTest RIPE BCNET Test
1-37
Dataset 1 58.55 52.73 43.68 58.85
Dataset 2 61.27 42.87 35.52 63.15
1-21, 23-29, 34-37Dataset 1 63.84 58.51 46.39 67.24
Dataset 2 63.36 46.55 38.73 64.68
SVM with cubic kernel
IWCSN 2018, Nanjing, China 35October 22, 2018
Cubic kernel Accuracy (%) F-Score (%)Selected features Training
datasetTest RIPE BCNET Test
1-37
Dataset 1 65.33 54.31 45.53 68.55
Dataset 2 63.15 46.23 40.17 65.49
1-21, 23-29, 34-37Dataset 1 69.21 58.12 49.26 70.14
Dataset 2 67.79 49.78 42.36 69.55
SVM algorithm and kernels
n SVM algorithm is one of the most efficient ML toolsn Kernels are used to transform the input data into a
high dimensional spacen Their performance depends on both the feature
selection and the type of datasetsn Analyzed BGP anomaly datasets are linearly
separablen SVM with linear kernel outperforms SVMs with
quadratic and cubic kernels
IWCSN 2018, Nanjing, China 36October 22, 2018
Roadmap
n Introduction and survey of past workn Intrusion detectionn Data processingn Classification algorithms
n Experimental proceduren Performance evaluation
n Conclusionn References
IWCSN 2018, Nanjing, China 37October 22, 2018
Intrusion detection
Various detection systems have been designed using machine learning techniques that help detect malicious intentions of network users.
§ Classification algorithms: J48, naive Bayes (NB), NB Tree, Random Forests (RF), Random Tree (RT), Multilayer Perception (MP), Support Vector Machine (SVM)
§ Deep learning algorithms: Network (NIDS) and Recurrent Neural Network (RNN-IDS) Intrusion Detection Systems
IWCSN 2018, Nanjing, China 38October 22, 2018
Intrusion detection (cont.)
§ Hybrid framework: Binary Classifier (BC) modules based on the C4.5 algorithm, aggregation module, and k-NN module
§ Recurrent Neural Networks (RNNs): Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (Bi-LSTM)
§ Broad Learning System (BLS): an alternative to deep learning networks with increased number of mapped features and enhancement nodes
IWCSN 2018, Nanjing, China 39October 22, 2018
Roadmap
n Introduction and survey of past workn Intrusion detectionn Data processingn Classification algorithms
n Experimental proceduren Performance evaluation
n Conclusionn References
IWCSN 2018, Nanjing, China 40October 22, 2018
Data Processingn NSL-KDD Dataset: Types of Intrusion Attacks https://web.archive.org/web/20150205070216/http://nsl.cs.unb.ca/NSL-KDD/
IWCSN 2018, Nanjing, China 41October 22, 2018
Type Intrusion attacksDoS(denial-of-service)
back, land, neptune, pod, smurf, teardrop, mailbomb,processtable, udpstorm, apache2, worm
U2R(user-to-root)
buffer-overflow, loadmodule, perl, rootkit, sqlattack, xterm, ps
R2L(remote-to- local)
fpt-write, guess-passwd, imap, multihop, phf, spy, warezmaster, xlock, xsnoop, snmpguess, snmpgetattack, httptunnel, sendmail, named
Probe ipsweep, nmap, portsweep, satan, mscan, saint
Data Processing
n NSL-KDD Dataset: number of data points
IWCSN 2018, Nanjing, China 42October 22, 2018
Regular DoS U2R R2L Probe TotalKDDTrain+ 67,343 45,927 52 995 11,656 125,973KDDTest+ 9,711 7,458 200 2,754 2,421 22,544KDDTest-21 2,152 4,342 200 2,754 2,402 11,850
Roadmap
n Introduction and survey of past workn Intrusion detectionn Data processingn Classification algorithms
n Experimental proceduren Performance evaluation
n Conclusionn References
IWCSN 2018, Nanjing, China 43October 22, 2018
Classification Algorithms
n Repeating module for the Long Short-Term Memory (LSTM) neural network:
IWCSN 2018, Nanjing, China 44October 22, 2018
Classification Algorithms
n Repeating module for the Gated Recurrent Unit (GRU) neural network:
IWCSN 2018, Nanjing, China 45October 22, 2018
Classification Algorithms
n Deep learning neural network model:
IWCSN 2018, Nanjing, China 46October 22, 2018
Classification Algorithms
n Module of the Broad Learning System algorithm with increment of mapped features, enhancement nodes, and new input data:
IWCSN 2018, Nanjing, China 47October 22, 2018
Roadmap
n Introduction and survey of past workn Intrusion detectionn Data processingn Classification algorithms
n Experimental proceduren Performance evaluation
n Conclusionn References
IWCSN 2018, Nanjing, China 48October 22, 2018
Experimental Procedure
• Step 1: Convert categorical into numerical features using dummy coding for training and test datasets
• Step 2: Normalize training and test datasets
• Step 3: Tune the model parameters during the 10-fold validation
• Step 4: Test LSTM, GRU, Bi-LSTM, and BLS models using KDDTest+ and KDDTest-21 datasets
• Step 5: Evaluate derived models based on accuracy and F-Score for binary and multiple classes
IWCSN 2018, Nanjing, China 49October 22, 2018
Roadmap
n Introduction and survey of past workn Intrusion detectionn Data processingn Classification algorithms
n Experimental proceduren Performance evaluation
n Conclusionn References
IWCSN 2018, Nanjing, China 50October 22, 2018
Performance Evaluation
IWCSN 2018, Nanjing, China 51October 22, 2018
Model LSTM GRU Bi-LSTM BLSTwo-way classification
Accuracy (%)
KDDTest+ 82.68 82.87 81.03 84.14KDDTest-21 64.32 65.42 64.31 72.64
F-Score (%)
KDDTest+ 82.76 83.05 81.23 84.68KDDTest-21 73.18 74.60 73.49 80.61
Five-way classification
Accuracy(%)
KDDTest+ 79.56 80.17 79.44 82.47KDDTest-21 60.51 60.75 60.80 70.30
Performance Evaluation
IWCSN 2018, Nanjing, China 52October 22, 2018
Model NB Tree [1] RT [1] NIDS [2] RNN-IDS[3]
Two-way classification
Accuracy(%)
KDDTest+ 82.02 81.59 75.75 83.28KDDTest-21 66.16 58.51 N/A 68.55
Performance Evaluation
IWCSN 2018, Nanjing, China 53October 22, 2018
Model J48 [3] NB [3] NB Tree [3] MP [3]Five-way classification
Accuracy(%)
KDDTest+ 74.60 74.40 75.40 78.10KDDTest-21 51.90 55.77 55.40 58.40
Model LSTM GRU Bi-LSTM BLS RNN-IDS[3]
Training time (s) 355.86 345.04 497.66 21.92 5,516.00
Roadmap
n Introduction and survey of past workn Intrusion detectionn Data processingn Classification algorithms
n Experimental proceduren Performance evaluation
n Conclusionn References
IWCSN 2018, Nanjing, China 54October 22, 2018
Conclusion§ Three types of RNNs and a BLS have been employed to detect
network intrusions. § KDDTest+ and KDDTest-21 datasets: BLS shows better
performance than LSTM, GRU, Bi-LSTM, and most reported results.
§ BLS performance depends on the number of mapped features and enhancement nodes.
§ While additional mapped features and enhancement nodes improve BLS performance, they require more memory and longer training time.
§ Advantage of the BLS model is that it requires considerably shorter time for training than the conventional deep learning networks.
IWCSN 2018, Nanjing, China 55October 22, 2018
Roadmap
n Introduction and past workn Intrusion detectionn Data processingn Classification algorithms
n Experimental proceduren Performance evaluation
n Conclusionn References
IWCSN 2018, Nanjing, China 56October 22, 2018
References
n M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the
KDD CUP 99 data set,” in Proc. IEEE Symp. Comput. Intell. in Security and Defense Appl. (CISDA), Ottawa, ON, Canada, July 2009, pp. 1–6.
n T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho, “Deep
learning approach for network intrusion detection in software defined networking,”
in Proc. Wireless Netw. Mobile Commun. (WINCOM), Fez, Morocco, Oct. 2016,
pp. 258–263.
n C. Yin, Y. Zhu, J. Fei, and X. He, “A deep learning approach for intrusion detection
using recurrent neural networks,” IEEE Access, vol. 5, pp. 21954–21961, Nov.
2017.
n L. Li, Y. Yu, S. Bai, Y. Hou, and X. Chen, “An effective two-step intrusion detection
approach based on binary classification and k-NN,” IEEE Access, vol. 6, pp.
12060–12073, Mar. 2018.
n C. L. P. Chen and Z. Liu, “Broad learning system: an effective and efficient
incremental learning system without the need for deep architecture,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 1, pp. 10–24, Jan. 2018.
n NSL-KDD Data Set [Online]. Available:
https://web.archive.org/web/20150205070216/http://nsl.cs.unb.ca/NSL-KDD/.
Accessed: July 18, 2018.IWCSN 2018, Nanjing, China 57October 22, 2018
References: sources of datan RIPE RIS raw data:
http://www.ripe.net/data-tools/n University of Oregon Route Views project:
http:// www.routeviews.org/n BCNET:
http://www.bc.net/n NSL-KDD Data Set:
https://web.archive.org/web/20150205070216/http://nsl.cs.unb.ca/NSL-KDD/
IWCSN 2018, Nanjing, China 58October 22, 2018
Publications:http://www.sfu.ca/~ljilja/cnl
n Z. Li, P. Batta, and Lj. Trajkovic, “Comparison of machine learning algorithms for detection of network intrusions,” in Proc. IEEE International Conference on Systems, Man, and Cybernetics (SMC 2018), Miyazaki, Japan, Oct. 2018, pp. 4238-4243.
n P. Batta, M. Singh, Z. Li, Q. Ding, and Lj. Trajkovic, “Evaluation of support vector machine kernels for detecting network anomalies,” in Proc. IEEE Int. Symp. Circuits and Systems, Florence, Italy, May 2018, pp. 1-4.
n Q. Ding, Z. Li, S. Haeri, and Lj. Trajković, “Application of machine learning techniques to detecting anomalies in communication networks: Datasets and Feature Selection Algorithms” in Cyber Threat Intelligence, M. Conti, A. Dehghantanha, and T. Dargahi, Eds., Berlin: Springer, pp. 47-70, 2018.
n Q. Ding, Z. Li, S. Haeri, and Lj. Trajković, “Application of machine learning techniques to detecting anomalies in communication networks: Classification Algorithms” in Cyber Threat Intelligence, M. Conti, A. Dehghantanha, and T. Dargahi, Eds., Berlin: Springer, pp. 71-92, 2018.
n Q. Ding, Z. Li, P. Batta, and Lj. Trajković, "Detecting BGP anomalies using machine learning techniques," in Proc. IEEE International Conference on Systems, Man, and Cybernetics (SMC 2016), Budapest, Hungary, Oct. 2016, pp. 3352-3355.
IWCSN 2018, Nanjing, China 59October 22, 2018
Publications:http://www.sfu.ca/~ljilja/cnl
n Y. Li, H. J. Xing, Q. Hua, X.-Z. Wang, P. Batta, S. Haeri, and Lj. Trajković, “Classification of BGP anomalies using decision trees and fuzzy rough sets,” in Proc. IEEE International Conference on Systems, Man, and Cybernetics, SMC 2014, San Diego, CA, October 2014, pp. 1312-1317.
n T. Farah and Lj. Trajkovic, "Anonym: a tool for anonymization of the Internet traffic," in Proc. 2013 IEEE International Conference on Cybernetics, CYBCONF 2013, Lausanne, Switzerland, June 2013, pp. 261-266.
n N. Al-Rousan, S. Haeri, and Lj. Trajković, “Feature selection for classification of BGP anomalies using Bayesian models," in Proc. International Conference on Machine Learning and Cybernetics, ICMLC 2012, Xi'an, China, July 2012, pp. 140-147.
n N. Al-Rousan and Lj. Trajković, “Machine learning models for classification of BGP anomalies,” in Proc. IEEE Conf. High Performance Switching and Routing, HPSR 2012, Belgrade, Serbia, June 2012, pp. 103-108.
n T. Farah, S. Lally, R. Gill, N. Al-Rousan, R. Paul, D. Xu, and Lj. Trajković, “Collection of BCNET BGP traffic," in Proc. 23rd ITC, San Francisco, CA, USA, Sept. 2011, pp. 322-323.
n S. Lally, T. Farah, R. Gill, R. Paul, N. Al-Rousan, and Lj. Trajković, “Collection and characterization of BCNET BGP traffic," in Proc. 2011 IEEE Pacific Rim Conf. Communications, Computers and Signal Processing, Victoria, BC, Canada, Aug. 2011, pp. 830-835.
October 22, 2018 IWCSN 2018, Nanjing, China 60