48

Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research
Page 2: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸Network Anomaly Detection▹Machine Learning▹Deep Learning

▸Related Studies▸Methodology

▸Deep Learning Optimization▹MNIST, CIFAR Dataset▹Deep Learning Research

Page 3: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research
Page 4: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research
Page 5: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

?

Page 6: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research
Page 7: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸K-means▹ G. Münz, S. Li, and G. Carle, "Traffic anomaly detection

using k-means clustering," in GI/ITG Workshop MMBnet, 2007

Page 8: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸K-means + Naïve Bayes▹ S. Varuna and P. Natesan,

"An integration of k-means clustering and naïve bayesclassifier for Intrusion Detection," presented at the 2015 3rd International Conference on Signal Processing, Communication and Networking (ICSCN), 2015.

Page 9: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸Naïve Bayes + Decision Tree▹ D. M. Farid, N. Harbi, and M. Z. Rahman, "Combining

naive bayes and decision tree for adaptive intrusion detection," arXiv preprint arXiv:1005.4496, 2010.

▸K-medoids + Naïve Bayes▹ R. Chitrakar and C. Huang, "Anomaly based Intrusion

Detection using Hybrid Learning Approach of combining k-Medoids Clustering and Naïve Bayes Classification," presented at the 2012 8th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM), 2012.

Page 10: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸ Payload-based anomaly detector (PAYL)▹ K. Wang and S. J. Stolfo, "Anomalous payload-based

network intrusion detection," presented at the International Workshop on Recent Advances in Intrusion Detection, 2004.

Page 11: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research
Page 12: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸Deep learningallows the computers to learn complicated patterns using various algorithms.

▸The structure of artificial neural network allows us to implement the concept of deep learning.

“Deep learning” by Goodfellow, et al. (2016)

Page 13: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸A Deep Learning Approach▹ Q. Niyaz, W. Sun, A. Y. Javaid, and M. Alam, "A Deep

Learning Approach for Network Intrusion Detection System," presented at the 9th EAI International Conference on Bio-inspired Information and Communications Technologies, New York, 2015.

Page 14: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸Protocol classification with Payload▹ Z. Wang, "The Applications of Deep Learning on

Traffic Identification," presented at the Black Hat, USA, 2015.

▹Classify the application layer protocol with payload data- Feature learning

- Protocol classification

Page 15: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research
Page 16: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

SSL

HTTP_Proxy

MySQL

Page 17: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research
Page 18: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸ Data: Packets retrieved from HKUCS network

▸ Preprocessing:▹ Check for data labels▹ Discard erroneous packets▹ For fixed size input of neural network:

padding and truncation

▸ Deep learning neural network for classification▹ Training▹ Parameter Tuning

Page 19: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸ Anomaly detection▹Modify the objective of the previous model

- Different application – different baseline- Keep the result of feature learning

▹May include features such as size ofpackets, number of packets arrived in agiven period, etc.▹ 1D Convolutional Neural Network

paddingkernel

input

output

Page 20: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research
Page 21: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸MNIST▹28px × 28px images of hand-written digits▹60,000 training samples, 10,000 test samples

▸CIFAR-100▹32px × 32px 100-class tiny color images▹50,000 training samples, 10,000 test samples

Page 22: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

MLP CNN

Input 28x28 images

400 units 30 filters 3x3

Activation (ReLU)

Dropout (0.2)

400 units 30 filters 3x3

Activation (ReLU)

Dropout (0.2)

400 unitsMaxpooling (2x2)

120 units

Activation (ReLU)

Dropout (0.2)

10-way softmax

Page 23: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸ All models were trained on fyp server (single core Intel i7 CPU, 2GB memory)

▸ Batch size = 100 ▸ 20 epochs (12k updates)▸ Stochastic gradient descent ▹Learning rate = 0.01

MLP CNN

No. of parameters 638,810 528,160

Test Accuracy 0.9667 0.9769

Test Loss 0.1051 0.0783

Train Accuracy 0.9601 0.9719

Train Loss 0.1369 0.0913

Training time 260s 3000s

Page 24: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

MLP CNN

Input 32x32 images

3000 units 32 filters 3x3

Activation (ReLU) Activation (ReLU)

Dropout (0.2) 32 filters 3x3

3000 units Activation (ReLU)

Activation (ReLU) Maxpooling (2x2)

Dropout (0.2) Dropout (0.25)

2000 units 64 filters 3x3

Activation (ReLU) Activation (ReLU)

Dropout (0.2) 64 filters 3x3

2000 units Activation (ReLU)

Activation (ReLU) Maxpooling (2x2)

Dropout (0.2) Dropout (0.25)

1000 units 512 units

Activation (ReLU) Activation (ReLU)

Dropout (0.2) Dropout (0.5)

100-way softmax

Page 25: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸ Batch size▹MLP: 128▹ CNN: 32

▸ 100 epochs▹MLP: 39k updates▹ CNN:156k updates (25 epochs: 39k updates)

▸ Stochastic gradient descent▹ Learning rate: 0.025▹ Decay: 10-6

▹Momentum: 0.9 (Nesterov)

Page 26: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

MLPCNN

(25 epochs)

CNN

(100 epochs)

No. of

parameters30,327,100 1,297,028

Test Accuracy 0.2535 0.4002 0.4270

Test Loss 4.1776 2.3355 2.2208

Train Accuracy 0.5805 0.3235 0.3705

Train Loss 1.5860 2.6801 2.4434

Training time 20000s 6750s 27,000s

Page 27: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸Exponential Linear Unit (ELU)▹Units with non-zero means will cause a bias shift

in the next layer.▹ If more units are correlated, the bias is larger.

𝑓 𝑥 = ቊ𝑥

𝛼(exp 𝑥 − 1), 𝑥 > 0, 𝑥 ≤ 0

, 𝑓′ 𝑥 = ቊ1

𝑓 𝑥 + 𝛼, 𝑥 > 0, 𝑥 ≤ 0

▸75.72% accuracy in CIFAR-100 (Best)D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network

learning by exponential linear units (elus)," arXiv preprint arXiv:1511.07289, 2015.

Page 28: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

Input 32x32 images

384 filters 3x3

ELU (α=1)

Maxpooling 2x2

384 filters 1x1

384 filters 2x2

640 filters 2x2

640 filters 2x2

ELU (α=1)

Dropout(0.1)

Maxpooling 2x2

640 filters 1x1

768 filters 2x2

768 filters 2x2

768 filters 2x2

ELU (α=1)

Dropout(0.2)

Maxpooling 2x2

768 filters 1x1

896 filters 2x2

896 filters 2x2

ELU (α=1)

Dropout(0.3)

Maxpooling 2x2

896 filters 3x3

1024 filters 2x2

ELU (α=1)

Dropout(0.4)

Maxpooling 2x2

1024 filters 1x1

1152 filters 2x2

ELU (α=1)

Dropout(0.5)

Maxpooling 2x2

1152 filters 1x1

ELU (α=1)

100-way Softmax

Page 29: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸Layer Sequential Unit Variance (LSUV) Initialization▹ Initialize the model with Gaussian noise 𝒩(0, 0.012)

became popular after CNN showed its success in 2012.▹Glorot & Benigo proposed a formula to estimate the

standard deviation, under the assumption that the relationships between each layer is non-linear.▹ Data-driven weight initialization which generalizes

the previous method was proposed by Mishkin & Matas (2016).

▸ 72.34% accuracy in CIFAR-100

Page 30: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸ Implement ELU + LSUV initialization▹ Batch size = 100▹ SGD

- Decay: 10-6

- Momentum: 0.9 (Nesterov)- Learning rate:

0.005 [1-200 epochs]0.0025 [201-400 epochs]0.0005 [401-500 epochs]

- Training time: 740s/epoch

▸ Test accuracy: 0.7015▹ Train accuracy: 0.7915

Input 32x32 images

80 filters 3x3

80 filters 1x1

ELU (α=1)

Maxpooling 2x2

140 filters 3x3

140 filters 2x2

ELU (α=1)

Dropout(0.1)

Maxpooling 2x2

180 filters 2x2

180 filters 1x1

ELU (α=1)

Dropout(0.2)

Maxpooling 2x2

200 filters 2x2

200 filters 1x1

ELU (α=1)

Dropout(0.3)

Maxpooling 2x2

512 units

ELU (α=1)

Dropout(0.5)

100-way Softmax

Page 31: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸ Comparison▹ SGD

- Learning rate: 0.01 [1-100 epochs]0.001 [101-200 epochs]0.0001 [201-300 epochs]

Input 32x32 images

80 filters 3x3

80 filters 1x1

Activation

Maxpooling 2x2

140 filters 3x3

140 filters 2x2

Activation

Dropout(0.1)

Maxpooling 2x2

180 filters 2x2

180 filters 1x1

Activation

Dropout(0.2)

Maxpooling 2x2

200 filters 2x2

200 filters 1x1

Activation

Dropout(0.3)

Maxpooling 2x2

512 units

Activation

Dropout(0.5)

100-way Softmax

Page 32: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸ Comparison▹ SGD

- Learning rate: 0.01 [1-100 epochs]0.001 [101-200 epochs]0.0001 [201-300 epochs]

Input 32x32 images

80 filters 3x3

80 filters 1x1

Activation

Maxpooling 2x2

140 filters 3x3

140 filters 2x2

Activation

Dropout(0.1)

Maxpooling 2x2

180 filters 2x2

180 filters 1x1

Activation

Dropout(0.2)

Maxpooling 2x2

200 filters 2x2

200 filters 1x1

Activation

Dropout(0.3)

Maxpooling 2x2

512 units

Activation

Dropout(0.5)

100-way Softmax

ELU ReLULeaky

ReLU

Test

Accuracy0.6837 0.6523 0.6773

Test Loss 1.1104 1.2186 1.1202

Train

Accuracy0.7076 0.6649 0.6953

Train Loss 0.9757 1.1291 1.1301

Page 33: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸Activation▹Adaptive piecewise linear

(APL) activation unit

▸Pooling▹Fractional max-pooling

▹All convolutional net

Page 34: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

▸ With the increase in computation power, more complex deep learning models can be used to perform classification.

▸ Deep learning research is active in object recognition, speech recognition, natural language processing.

▸ Our project focuses on applying deep learning to network anomaly detection. Existing optimization techniques can be implemented to various models.

Page 35: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

Any questions?

Wu Tien Hsuan (Kevin)

[email protected]

Page 36: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

Appendix

1. Results of previous works

2. Dataset details

1) KDD Cup 99

2) CIFAR-100

3. Nesterov Gradient, Adaptive piecewise

linear (APL) activation unit

References (Interim report)

Page 37: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

G. Münz, S. Li, and G. Carle, "Traffic

anomaly detection using k-means

clustering," in GI/ITG Workshop

MMBnet, 2007

Page 38: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

S. Varuna and P. Natesan, "An integration of k-means

clustering and naïve bayes classifier for Intrusion

Detection," presented at the 2015 3rd International

Conference on Signal Processing, Communication and

Networking (ICSCN), 2015.

Page 39: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

D. M. Farid, N. Harbi, and M. Z. Rahman, "Combining

naive bayes and decision tree for adaptive intrusion

detection," arXiv preprint arXiv:1005.4496, 2010.

Page 40: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

R. Chitrakar and C. Huang, "Anomaly based Intrusion

Detection using Hybrid Learning Approach of combining

k-Medoids Clustering and Naïve Bayes Classification,"

presented at the 2012 8th International Conference on

Wireless Communications, Networking and Mobile

Computing (WiCOM), 2012.

Page 41: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

K. Wang and S. J. Stolfo, "Anomalous payload-based network intrusion

detection," presented at the International Workshop on Recent Advances in

Intrusion Detection, 2004.

Page 42: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

Q. Niyaz, W. Sun, A. Y.

Javaid, and M. Alam, "A

Deep Learning

Approach for Network

Intrusion Detection

System," presented at

the 9th EAI

International

Conference on Bio-

inspired Information

and Communications

Technologies, New

York, 2015.

Page 43: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

Z. Wang, "The Applications of Deep Learning on Traffic

Identification," presented at the Black Hat, USA, 2015.

Page 44: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

KDD99

Attacks fall into four main

categories:

•DOS: denial-of-service, e.g.

syn flood;

•R2L: unauthorized access

from a remote machine, e.g.

guessing password;

•U2R: unauthorized access to

local superuser (root)

privileges, e.g., various “buffer

overflow” attacks;

•probing: surveillance and

other probing, e.g., port

scanning.

Page 45: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

CIFAR-100

Page 46: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

Nesterov Gradient

Adaptive piecewise linear (APL) activation unit

Page 47: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

References

1. Symantec. Internet Security Threat

Report 2016 2016. Available from:

https://www.symantec.com/security-center/threat-

report.

2. Cisco. Snort 2016. Available from:

https://www.snort.org/.

3. Open Information Security Foundation.

Suricata 2016. Available from: https://suricata-

ids.org/.

4. LeCun Y, Bottou L, Bengio Y, Haffner P.

Gradient-based learning applied to document

recognition. Proceedings of the IEEE.

1998;86:2278-324.

5. Krizhevsky A, Hinton G. Learning

multiple layers of features from tiny images. 2009.

6. Patcha A, Park J-M. An overview of

anomaly detection techniques: Existing solutions and

latest technological trends. Computer networks.

2007;51:3448-70.

7. Agrawal S, Agrawal J. Survey on Anomaly

Detection using Data Mining Techniques. Procedia

Computer Science. 2015;60:708-13.

8. Goodfellow I, Bengio Y, Courville A. Deep

Learning. 2016.

9. Münz G, Li S, Carle G. Traffic anomaly

detection using k-means clustering. GI/ITG Workshop

MMBnet; 2007.

10. Chitrakar R, Huang C. Anomaly based

Intrusion Detection using Hybrid Learning Approach of

combining k-Medoids Clustering and Naïve Bayes

Classification. 2012 8th International Conference on

Wireless Communications, Networking and Mobile

Computing (WiCOM); 2012.

11. Muda Z, Yassin W, Sulaiman M, Udzir NI. A

K-Means and Naive Bayes learning approach for better

intrusion detection. Information Technology Journal.

2011;10:648-55.

Page 48: Network Anomaly Detectioni.cs.hku.hk/fyp/2016/fyp16021/docs/interimpresentation.pdf · more complex deep learning models can be used to perform classification. Deep learning research

12. Varuna S, Natesan P. An integration

of k-means clustering and naïve bayes classifier

for Intrusion Detection. 2015 3rd International

Conference on Signal Processing,

Communication and Networking (ICSCN); 2015.

13. Farid DM, Harbi N, Rahman MZ.

Combining naive bayes and decision tree for

adaptive intrusion detection. arXiv preprint

arXiv:10054496. 2010.

14. Wang K, Stolfo SJ. Anomalous

payload-based network intrusion detection.

International Workshop on Recent Advances in

Intrusion Detection; 2004.

15. Niyaz Q, Sun W, Javaid AY, Alam M.

A Deep Learning Approach for Network Intrusion

Detection System. 9th EAI International

Conference on Bio-inspired Information and

Communications Technologies; 2015.

16. Hettich S, Bay SD. The UCI KDD Archive.

Irvine, CA: University of California, Department of

Information and Computer Science1999.

17. Wang Z. The Applications of Deep Learning

on Traffic Identification. Black Hat; 2015.

18. Clevert D-A, Unterthiner T, Hochreiter S.

Fast and accurate deep network learning by

exponential linear units (elus). arXiv preprint

arXiv:151107289. 2015.

19. Mishkin D, Matas J. All you need is a good

init. arXiv preprint arXiv:151106422. 2015.

20. Glorot X, Bengio Y, Understanding the

difficulty of training deep feedforward neural networks.

2010: Publisher.

21. Agostinelli F, Hoffman M, Sadowski P, Baldi

P. Learning activation functions to improve deep neural

networks. arXiv preprint arXiv:14126830. 2014.

22. Graham B. Fractional max-pooling. arXiv

preprint arXiv:14126071. 2014.

23. Springenberg JT, Dosovitskiy A, Brox T,

Riedmiller M. Striving for simplicity: The all

convolutional net. arXiv preprint arXiv:14126806. 2014.