Robustness of IDS based on adversarial machine learning

Preview:

Citation preview

Robustness of IDS based on adversarial machine learning

Paul PeseuxSupervisor : Gregory Blanc

Telecom SudParis

August 28, 2019

1 / 57

Outline

1 Intrusion Detection System

2 NovGAN

3 SWAGAN

4 Conclusion

2 / 57

Outline

1 Intrusion Detection System

2 NovGAN

3 SWAGAN

4 Conclusion

3 / 57

Intrusion Detection System

A system that has to detect intrusions

It raises alerts, but does not take any actions. Alerts are stored, in adatabase for example.

It is a binary classifier

4 / 57

Intrusion Detection System

A system that has to detect intrusions

It raises alerts, but does not take any actions. Alerts are stored, in adatabase for example.

It is a binary classifier

5 / 57

Intrusion Detection System

A system that has to detect intrusions

It raises alerts, but does not take any actions. Alerts are stored, in adatabase for example.

It is a binary classifier

6 / 57

Intrusion Detection System

Figure: IDS example

7 / 57

Intrusion Detection System

To detect intrusions one can use

Rule based

Statistical

−→ Machine Learning hype to train the classifier

Random Forest [Farnaaz 2016]

Decision Trees [Hota 2014]

Artificail Neural Network [Ingre 2015]

. . .

8 / 57

Intrusion Detection System

To detect intrusions one can use

Rule based

Statistical

−→ Machine Learning hype to train the classifier

Random Forest [Farnaaz 2016]

Decision Trees [Hota 2014]

Artificail Neural Network [Ingre 2015]

. . .

9 / 57

GANs

GAN appeared in 2014 [Goodfellow 2014]. There are a lot of work on it :

[Radford 2016]

[Tolstikhin 2017]

[Arjovsky 2017]

This Person Does Not Exists

CycleGAN [Zhu 2017]

. . .

10 / 57

GANs

Figure: GAN representation. Illustration found in [Spi ]

11 / 57

Outline

1 Intrusion Detection System

2 NovGAN

3 SWAGAN

4 Conclusion

12 / 57

Generator Modification

Attacker G objectives

Evade the IDS

Hurt (DDOS, MITM, Pishing, ...)

Generate very realistic traffic is not good!

13 / 57

Generator Modification

Classical GAN [Goodfellow 2014] Learning can be seen as a min-max gamebetween 2 players :

minG

maxD

EPd[log D(x)] + EPz [log 1− DG (z)]

14 / 57

Generator Modification

Definition

Let’s call hurting function any M C∞ such as

M : T → Rt 7→ M(t)

where 0 ≤ M ≤ 1.

G hurting objective is measured with EPz [MG (z)]

15 / 57

Generator Modification

Definition

Let’s call hurting function any M C∞ such as

M : T → Rt 7→ M(t)

where 0 ≤ M ≤ 1.

G hurting objective is measured with EPz [MG (z)]

16 / 57

Generator Modification

From the Generator point of view [Goodfellow 2014] :

lossG = −Ex∼N (0,1)[log DG (x)]

First idea is

lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]

Proposed (in different context) in [Marriott 2018]This gives poor results on our datasetsOne has to link those objectives

17 / 57

Generator Modification

From the Generator point of view [Goodfellow 2014] :

lossG = −Ex∼N (0,1)[log DG (x)]

First idea is

lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]

Proposed (in different context) in [Marriott 2018]This gives poor results on our datasetsOne has to link those objectives

18 / 57

Generator Modification

From the Generator point of view [Goodfellow 2014] :

lossG = −Ex∼N (0,1)[log DG (x)]

First idea is

lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]

Proposed (in different context) in [Marriott 2018]

This gives poor results on our datasetsOne has to link those objectives

19 / 57

Generator Modification

From the Generator point of view [Goodfellow 2014] :

lossG = −Ex∼N (0,1)[log DG (x)]

First idea is

lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]

Proposed (in different context) in [Marriott 2018]This gives poor results on our datasets

One has to link those objectives

20 / 57

Generator Modification

From the Generator point of view [Goodfellow 2014] :

lossG = −Ex∼N (0,1)[log DG (x)]

First idea is

lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]

Proposed (in different context) in [Marriott 2018]This gives poor results on our datasetsOne has to link those objectives

21 / 57

Generator Modification

Another way to link those to objectives is weighted average

Weight depend on hurting with LD,G (x) = − log DG (x)

lossG = Ex∼N (0,1)[MG (x)LD,G (x) + (1−MG (x))(αLD,G (x) + offset)]

22 / 57

Generator Modification

Another way to link those to objectives is weighted average

Weight depend on hurting with LD,G (x) = − log DG (x)

lossG = Ex∼N (0,1)[MG (x)LD,G (x) + (1−MG (x))(αLD,G (x) + offset)]

23 / 57

Generator Modification

Figure: Vue de la nouvelle fonction de cout en 3D. Representation pas a l’echelle

24 / 57

Results on MNIST

Figure: Generated Images without modification

25 / 57

Results on MNIST

Figure: Generated images with α = 1 and offset = 10.2

26 / 57

Results on MNIST

Figure: Generated images with α = 2.5 and offset = 10.3

27 / 57

Results on MNIST

Figure: In blue, hurting distribution of data test. In orange, hurting distributionof generated data

28 / 57

Results on NSL-KDD

On network traffic, Hurting is difficult to define.

We extended Common Vulnerability Scoring System (CVSS) score onNSL-KDD dataset.

Exploitability = 20× AccessVector × AttackComplexity × AuthenticationImpact = 10.4× (1− (1−ConfImpact)× (1− IntegImpact)× (1−AvailImpact))

f (Impact) =

{0, if Impact = 0

1.176, otherwise

BaseScore = (0.6× Impact + 0.4× Exploitability − 1.5)× f (Impact)

29 / 57

Results on NSL-KDD

On network traffic, Hurting is difficult to define.

We extended Common Vulnerability Scoring System (CVSS) score onNSL-KDD dataset.

Exploitability = 20× AccessVector × AttackComplexity × AuthenticationImpact = 10.4× (1− (1−ConfImpact)× (1− IntegImpact)× (1−AvailImpact))

f (Impact) =

{0, if Impact = 0

1.176, otherwise

BaseScore = (0.6× Impact + 0.4× Exploitability − 1.5)× f (Impact)

30 / 57

Results on NSL-KDD

On network traffic, Hurting is difficult to define.

We extended Common Vulnerability Scoring System (CVSS) score onNSL-KDD dataset.

Exploitability = 20× AccessVector × AttackComplexity × AuthenticationImpact = 10.4× (1− (1−ConfImpact)× (1− IntegImpact)× (1−AvailImpact))

f (Impact) =

{0, if Impact = 0

1.176, otherwise

BaseScore = (0.6× Impact + 0.4× Exploitability − 1.5)× f (Impact)

31 / 57

Results on NSL-KDD

Figure: Hurting value distribution on NSLKDD

Surprisingly it is a decent classifier.Warning: Hurting function is ultra data-specific.

32 / 57

Results on NSL-KDD

Figure: Hurting value distribution on NSLKDD

Surprisingly it is a decent classifier.Warning: Hurting function is ultra data-specific.

33 / 57

Results on NSL-KDD

34 / 57

Limits on NovGAN

No theoretical proof while major papers [Goodfellow 2014] [Arjovsky 2017][Srivastava 2017]

Some mini result on toy example

Figure: Nash equilibrium

35 / 57

Limits on NovGAN

No theoretical proof while major papers [Goodfellow 2014] [Arjovsky 2017][Srivastava 2017]Some mini result on toy example

Figure: Nash equilibrium

36 / 57

Outline

1 Intrusion Detection System

2 NovGAN

3 SWAGAN

4 Conclusion

37 / 57

SWAGAN

Definition

Mode Collapse is a case than sometimes happen during GAN training.The Generator stays stuck in a specific mode of the real data distributionand the gradient is stuck to 0

Figure: Mode Collapse

38 / 57

SWAGAN

Input: n, p, s;Initialization: Create n duos;while n > 1 do

for duo in duos dotrain duo on p epochs

end

i = 0, ~g = ~0, ~d = ~0 ;while i ¡ s do

Shuffle duos;~g += perf (generators), ;~d += perf (discriminators)

endRemove worst generator : argmin(~g);

Remove worst discriminator : argmin(~d);Shuffle duos;

endOutput: Trained Generator and Discriminator;

Algorithm 1: SWAGAN algorithm39 / 57

SWAGAN

Figure: SWAGAN starting Architecture Figure: SWAGAN step

40 / 57

GANs

Figure: GAN representation. Illustration found in [Spi ]

41 / 57

SWAGAN Risk

Figure: SWAGAN Risk

42 / 57

Results on MNIST

Figure: Loss evolution during SWAGAN training

43 / 57

Results on NSL-KDD

Figure: Erratic SWAGAN learning on NSL-KDD

44 / 57

Results on NSL-KDD

Figure: First 2 components NSLKDD data visualization

45 / 57

Outline

1 Intrusion Detection System

2 NovGAN

3 SWAGAN

4 Conclusion

46 / 57

Conclusion

We presented 2 new versions of GANs :

NovGAN based on Generator loss modification

SWAGAN based on shuffling

Great results on MNIST

More difficult on network traffic datasets

47 / 57

Conclusion

We presented 2 new versions of GANs :

NovGAN based on Generator loss modification

SWAGAN based on shuffling

Great results on MNIST

More difficult on network traffic datasets

48 / 57

Conclusion

We presented 2 new versions of GANs :

NovGAN based on Generator loss modification

SWAGAN based on shuffling

Great results on MNIST

More difficult on network traffic datasets

49 / 57

Conclusion

We presented 2 new versions of GANs :

NovGAN based on Generator loss modification

SWAGAN based on shuffling

Great results on MNIST

More difficult on network traffic datasets

50 / 57

Conclusion

We presented 2 new versions of GANs :

NovGAN based on Generator loss modification

SWAGAN based on shuffling

Great results on MNIST

More difficult on network traffic datasets

51 / 57

Conclusion

Thanks for your attentionAny questions ?

52 / 57

Conclusion

Thanks for your attentionAny questions ?

53 / 57

Martın Arjovsky, Soumith Chintala et Leon Bottou.Wasserstein Generative Adversarial Networks.In ICML, 2017.

Nabila Farnaaz et Md. Abdul Jabbar.Random Forest Modeling for Network Intrusion Detection System.2016.

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, DavidWarde-Farley, Sherjil Ozair, Aaron C. Courville et Yoshua Bengio.Generative Adversarial Nets.In NIPS, 2014.

H. S. Hota et Akhilesh Kumar Shrivas.Decision Tree Techniques Applied on NSL-KDD Data and ItsComparison with Various Feature Selection Techniques.2014.

Bhupendra Ingre et Anamika Yadav.Performance analysis of NSL-KDD dataset using ANN.

54 / 57

2015 International Conference on Signal Processing andCommunication Engineering Systems, pages 92–96, 2015.

Richard T. Marriott, Sami Romdhani et Liming Chen.Intra-class Variation Isolation in Conditional GANs.ArXiv, vol. abs/1811.11296, 2018.

Alec Radford, Luke Metz et Soumith Chintala.Unsupervised Representation Learning with Deep ConvolutionalGenerative Adversarial Networks.CoRR, vol. abs/1511.06434, 2016.

Andrea Missinato.https://www.spindox.it/en/blog/

generative-adversarial-neural-networks/.Accessed: 2019-05-07.

Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann etCharles A. Sutton.VEEGAN: Reducing Mode Collapse in GANs using Implicit VariationalLearning.

55 / 57

In NIPS, 2017.

Ilya O. Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-JohannSimon-Gabriel et Bernhard Scholkopf.AdaGAN: Boosting Generative Models.In NIPS, 2017.

Jun-Yan Zhu, Taesung Park, Phillip Isola et Alexei A. Efros.Unpaired Image-to-Image Translation Using Cycle-ConsistentAdversarial Networks.2017 IEEE International Conference on Computer Vision (ICCV),pages 2242–2251, 2017.

56 / 57

Appendices

MKDD( ~T ) =impact

2× exploitability

with impact = 1− (1− naf )× (1− nfc)× (1− sb+db2 )

and exploitability = 50× naf × ns × sb+db2 × (rs + nr) sa2

naf : num-access-file

ns : num-shells

sb : src-bytes

db : dst-bytes

rs : root-shell

nr : num-root

sa : su-attempted

nfc : num-file-creation

57 / 57

Recommended