Robustness of IDS based on adversarial machine learning

Paul PeseuxSupervisor : Gregory Blanc

Telecom SudParis

August 28, 2019

1 / 57

Outline

1 Intrusion Detection System

2 NovGAN

3 SWAGAN

4 Conclusion

2 / 57

Outline

2 NovGAN

3 SWAGAN

4 Conclusion

3 / 57

Intrusion Detection System

A system that has to detect intrusions

It raises alerts, but does not take any actions. Alerts are stored, in adatabase for example.

It is a binary classifier

4 / 57

5 / 57

6 / 57

Figure: IDS example

7 / 57

To detect intrusions one can use

Rule based

Statistical

−→ Machine Learning hype to train the classifier

Random Forest [Farnaaz 2016]

Decision Trees [Hota 2014]

Artificail Neural Network [Ingre 2015]

8 / 57

To detect intrusions one can use

Rule based

Statistical

−→ Machine Learning hype to train the classifier

Random Forest [Farnaaz 2016]

Decision Trees [Hota 2014]

Artificail Neural Network [Ingre 2015]

9 / 57

GAN appeared in 2014 [Goodfellow 2014]. There are a lot of work on it :

[Radford 2016]

[Tolstikhin 2017]

[Arjovsky 2017]

This Person Does Not Exists

CycleGAN [Zhu 2017]

10 / 57

Figure: GAN representation. Illustration found in [Spi ]

11 / 57

Outline

2 NovGAN

3 SWAGAN

4 Conclusion

12 / 57

Generator Modification

Attacker G objectives

Evade the IDS

Hurt (DDOS, MITM, Pishing, ...)

Generate very realistic traffic is not good!

13 / 57

Classical GAN [Goodfellow 2014] Learning can be seen as a min-max gamebetween 2 players :

EPd[log D(x)] + EPz [log 1− DG (z)]

14 / 57

Definition

Let’s call hurting function any M C∞ such as

M : T → Rt 7→ M(t)

where 0 ≤ M ≤ 1.

G hurting objective is measured with EPz [MG (z)]

15 / 57

Definition

Let’s call hurting function any M C∞ such as

M : T → Rt 7→ M(t)

where 0 ≤ M ≤ 1.

G hurting objective is measured with EPz [MG (z)]

16 / 57

From the Generator point of view [Goodfellow 2014] :

lossG = −Ex∼N (0,1)[log DG (x)]

First idea is

lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]

Proposed (in different context) in [Marriott 2018]This gives poor results on our datasetsOne has to link those objectives

17 / 57

First idea is

18 / 57

First idea is

Proposed (in different context) in [Marriott 2018]

This gives poor results on our datasetsOne has to link those objectives

19 / 57

First idea is

Proposed (in different context) in [Marriott 2018]This gives poor results on our datasets

One has to link those objectives

20 / 57

First idea is

21 / 57

Another way to link those to objectives is weighted average

Weight depend on hurting with LD,G (x) = − log DG (x)

lossG = Ex∼N (0,1)[MG (x)LD,G (x) + (1−MG (x))(αLD,G (x) + offset)]

22 / 57

Another way to link those to objectives is weighted average

Weight depend on hurting with LD,G (x) = − log DG (x)

lossG = Ex∼N (0,1)[MG (x)LD,G (x) + (1−MG (x))(αLD,G (x) + offset)]

23 / 57

Figure: Vue de la nouvelle fonction de cout en 3D. Representation pas a l’echelle

24 / 57

Results on MNIST

Figure: Generated Images without modification

25 / 57

Results on MNIST

Figure: Generated images with α = 1 and offset = 10.2

26 / 57

Results on MNIST

Figure: Generated images with α = 2.5 and offset = 10.3

27 / 57

Results on MNIST

Figure: In blue, hurting distribution of data test. In orange, hurting distributionof generated data

28 / 57

Results on NSL-KDD

On network traffic, Hurting is difficult to define.

We extended Common Vulnerability Scoring System (CVSS) score onNSL-KDD dataset.

Exploitability = 20× AccessVector × AttackComplexity × AuthenticationImpact = 10.4× (1− (1−ConfImpact)× (1− IntegImpact)× (1−AvailImpact))

f (Impact) =

{0, if Impact = 0

1.176, otherwise

BaseScore = (0.6× Impact + 0.4× Exploitability − 1.5)× f (Impact)

29 / 57

Results on NSL-KDD

f (Impact) =

{0, if Impact = 0

1.176, otherwise

30 / 57

Results on NSL-KDD

f (Impact) =

{0, if Impact = 0

1.176, otherwise

31 / 57

Results on NSL-KDD

Figure: Hurting value distribution on NSLKDD

Surprisingly it is a decent classifier.Warning: Hurting function is ultra data-specific.

32 / 57

Results on NSL-KDD

Figure: Hurting value distribution on NSLKDD

Surprisingly it is a decent classifier.Warning: Hurting function is ultra data-specific.

33 / 57

Results on NSL-KDD

34 / 57

Limits on NovGAN

No theoretical proof while major papers [Goodfellow 2014] [Arjovsky 2017][Srivastava 2017]

Some mini result on toy example

Figure: Nash equilibrium

35 / 57

Limits on NovGAN

No theoretical proof while major papers [Goodfellow 2014] [Arjovsky 2017][Srivastava 2017]Some mini result on toy example

Figure: Nash equilibrium

36 / 57

Outline

2 NovGAN

3 SWAGAN

4 Conclusion

37 / 57

SWAGAN

Definition

Mode Collapse is a case than sometimes happen during GAN training.The Generator stays stuck in a specific mode of the real data distributionand the gradient is stuck to 0

Figure: Mode Collapse

38 / 57

SWAGAN

Input: n, p, s;Initialization: Create n duos;while n > 1 do

for duo in duos dotrain duo on p epochs

i = 0, ~g = ~0, ~d = ~0 ;while i ¡ s do

Shuffle duos;~g += perf (generators), ;~d += perf (discriminators)

endRemove worst generator : argmin(~g);

Remove worst discriminator : argmin(~d);Shuffle duos;

endOutput: Trained Generator and Discriminator;

Algorithm 1: SWAGAN algorithm39 / 57

SWAGAN

Figure: SWAGAN starting Architecture Figure: SWAGAN step

40 / 57

Figure: GAN representation. Illustration found in [Spi ]

41 / 57

SWAGAN Risk

Figure: SWAGAN Risk

42 / 57

Results on MNIST

Figure: Loss evolution during SWAGAN training

43 / 57

Results on NSL-KDD

Figure: Erratic SWAGAN learning on NSL-KDD

44 / 57

Results on NSL-KDD

Figure: First 2 components NSLKDD data visualization

45 / 57

Outline

2 NovGAN

3 SWAGAN

4 Conclusion

46 / 57

Conclusion

We presented 2 new versions of GANs :

NovGAN based on Generator loss modification

SWAGAN based on shuffling

Great results on MNIST

More difficult on network traffic datasets

47 / 57

Conclusion

48 / 57

Conclusion

49 / 57

Conclusion

50 / 57

Conclusion

51 / 57

Conclusion

Thanks for your attentionAny questions ?

52 / 57

Conclusion

Thanks for your attentionAny questions ?

53 / 57

Martın Arjovsky, Soumith Chintala et Leon Bottou.Wasserstein Generative Adversarial Networks.In ICML, 2017.

Nabila Farnaaz et Md. Abdul Jabbar.Random Forest Modeling for Network Intrusion Detection System.2016.

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, DavidWarde-Farley, Sherjil Ozair, Aaron C. Courville et Yoshua Bengio.Generative Adversarial Nets.In NIPS, 2014.

H. S. Hota et Akhilesh Kumar Shrivas.Decision Tree Techniques Applied on NSL-KDD Data and ItsComparison with Various Feature Selection Techniques.2014.

Bhupendra Ingre et Anamika Yadav.Performance analysis of NSL-KDD dataset using ANN.

54 / 57

2015 International Conference on Signal Processing andCommunication Engineering Systems, pages 92–96, 2015.

Richard T. Marriott, Sami Romdhani et Liming Chen.Intra-class Variation Isolation in Conditional GANs.ArXiv, vol. abs/1811.11296, 2018.

Alec Radford, Luke Metz et Soumith Chintala.Unsupervised Representation Learning with Deep ConvolutionalGenerative Adversarial Networks.CoRR, vol. abs/1511.06434, 2016.

Andrea Missinato.https://www.spindox.it/en/blog/

generative-adversarial-neural-networks/.Accessed: 2019-05-07.

Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann etCharles A. Sutton.VEEGAN: Reducing Mode Collapse in GANs using Implicit VariationalLearning.

55 / 57

In NIPS, 2017.

Ilya O. Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-JohannSimon-Gabriel et Bernhard Scholkopf.AdaGAN: Boosting Generative Models.In NIPS, 2017.

Jun-Yan Zhu, Taesung Park, Phillip Isola et Alexei A. Efros.Unpaired Image-to-Image Translation Using Cycle-ConsistentAdversarial Networks.2017 IEEE International Conference on Computer Vision (ICCV),pages 2242–2251, 2017.

56 / 57

Appendices

MKDD( ~T ) =impact

2× exploitability

with impact = 1− (1− naf )× (1− nfc)× (1− sb+db2 )

and exploitability = 50× naf × ns × sb+db2 × (rs + nr) sa2

naf : num-access-file

ns : num-shells

sb : src-bytes

db : dst-bytes

rs : root-shell

nr : num-root

sa : su-attempted

nfc : num-file-creation

57 / 57

Robustness of IDS based on adversarial machine learning

Documents

Certiﬁed Robustness to Adversarial Examples with ...mathias.lecuyer.me/assets/assets/pixeldp_sp19.pdf · certiﬁed robustness check for individual predictions. Passing the check

Evaluating and Understanding the Robustness of Adversarial ...secml18-poster.pdfEvaluatingandUnderstandingtheRobustnessofAdversarialLogitPairing LoganEngstrom∗ AndrewIlyas∗ AnishAthalye∗

Certiﬁed Robustness to Adversarial Word Substitutions · Certiﬁed Robustness to Adversarial Word Substitutions Robin Jia Aditi Raghunathan Kerem Goksel Percy Liang¨ Computer

Towards Achieving Adversarial Robustness Beyond Perceptual

Model Compression with Adversarial Robustness: A Unified … · 2020. 2. 13. · Model Compression with Adversarial Robustness: A Uniﬁed Optimization Framework Shupeng Gui,, Haotao

Improving Adversarial Robustness via Promoting Ensemble Diversity · 2019. 5. 30. · Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu Pang 1Kun Xu Chao Du

ME-Net: Towards Effective Adversarial Robustness with ... · ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation Yuzhe Yang 1Guo Zhang Dina Katabi Zhi Xu1 Abstract

Robustness to Adversarial Examples Presenters: Pooja Harekoppa, Daniel Friedman

Maximum-Entropy Adversarial Data Augmentation for Improved … · 2020. 10. 19. · Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness Long Zhao

Robustness of adversarial camouflage (AC) for naval vessels

Adversarial Immunization for Improving Certifiable ...Adversarial Immunization for Improving Certifiable Robustness on Graphs Shuchang Tao12, Huawei Shen1, Qi Cao12, Liang Hou12, Xueqi

Adversarial Approaches to Bayesian Learning and Bayesian ... · Learning and Bayesian Approaches to Adversarial Robustness Ian Goodfellow, OpenAI Research Scientist NIPS 2016 Workshop

Semidefinite relaxations for certifying robustness to adversarial … · 2019. 1. 13. · Semidefinite relaxations for certifying robustness to adversarial examples Jacob Steinhardt

PERCEPTUAL ADVERSARIAL ROBUSTNESS DEFENSE AGAINST …

The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

RobustBench: a standardized adversarial robustness benchmarkRobustBench: a standardized adversarial robustness benchmark Francesco Croce University of Tubin gen Maksym Andriushchenko

Feature Denoising for Improving Adversarial Robustnessstate-of-the-art in adversarial robustness against white-box attacks; we use it to train our networks. Adversarial logit paring

Multitask Learning Strengthens Adversarial Robustness … · 2020. 7. 15. · Multitask Learning Strengthens Adversarial Robustness 3 2 Related Work We brie y review related work

BATCH NORMALIZATION INCREASES ADVERSARIAL … · 2020. 10. 9. · BATCH NORMALIZATION INCREASES ADVERSARIAL VULNERABILITY: DISENTANGLING USEFULNESS AND ROBUSTNESS OF MODEL FEATURES

Adversarial Robustness: From Self-Supervised Pre-Training ... · Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning Tianlong Chen1, Sijia Liu2, Shiyu Chang2,