51
Trust Region Based Adversarial Attack on Neural Networks Zhewei Yao 1 Amir Gholami 1 Peng Xu 2 Kurt Keutzer 1 Michael W. Mahoney 1 1 University of California, Berkeley 2 Stanford University IEEE Conference on Computer Vision and Pattern Recognition 2019 Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 1 / 51

Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Trust Region Based Adversarial Attack on NeuralNetworks

Zhewei Yao1 Amir Gholami1 Peng Xu2 Kurt Keutzer1

Michael W. Mahoney1

1University of California, Berkeley

2Stanford University

IEEE Conference on Computer Vision and Pattern Recognition 2019

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 1 / 51

Page 2: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Table of Contents

1 Introduction

2 Background

3 Contributions

4 Trust Region Optimization

5 Proposed Method

6 Results

7 Conclusion

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 2 / 51

Page 3: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Table of Contents

1 Introduction

2 Background

3 Contributions

4 Trust Region Optimization

5 Proposed Method

6 Results

7 Conclusion

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 3 / 51

Page 4: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

IntroductionNeural networks are vulnerable to adversarial examples, inputs thatare crafted to fool the network

They are usually imperceptible, can cause a significant decrease inaccuracy, and can transfer to other networks that an attacker has notseen.

Figure: An attack on an image using the FGSM method

Explaining and harnessing adversarial examples (Goodfellow et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 4 / 51

Page 5: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

One Pixel Attack

One Pixel Attack for Fooling Deep Neural Networks (Su et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 5 / 51

Page 6: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

3D adversarial objects

Synthesizing Robust Adversarial Examples (Athalye et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 6 / 51

Page 7: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Physical attacks on traffic signs

Robust Physical-World Attacks on Deep Learning Visual Classification (Eykholt et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 7 / 51

Page 8: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Adversarial Patches to Attack Person Detection

Adversarial Patches to Attack Person Detection (Thys et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 8 / 51

Page 9: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Semantic Segmentation and Object Detection

Adversarial Examples for Semantic Segmentation and Object Detection (Xie et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 9 / 51

Page 10: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

LIDAR attack

Adversarial Objects Against LiDAR-Based Autonomous Driving Systems (Cao et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 10 / 51

Page 11: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Table of Contents

1 Introduction

2 Background

3 Contributions

4 Trust Region Optimization

5 Proposed Method

6 Results

7 Conclusion

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 11 / 51

Page 12: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Definitions

Untargeted attacks are attacks that change the model classificationto a wrong label. Targeted attacks change the model classification toa specific class.

One-shot / one-step attack require one computational step togenerate an adversary. Methods that require an iterative loop arecalled iterative attacks.

White-box attacks need complete information of the target network(network architecture, gradients, parameters, etc.), while black-boxattacks do not need such information.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 12 / 51

Page 13: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Problem Formulation

Szegedy et al. first formalized the problem of finding adversarialexamples as the following optimization problem

min ||r||2subject to f(x+ r) 6= l

x+ r ∈ [0, 1]m

where l is the target label.

However, this is often computationally infeasible to solve. A commonapproach is to approximate it.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 13 / 51

Page 14: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

L-BFGS

Szegedy et al. solved the following alternative optimization problemusing a box-constrained L-BFGS

min c|r|+ lossf (x+ r, l)

subject to x+ r ∈ [0, 1]m

Intriguing properties of neural networks (Szegedy et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 14 / 51

Page 15: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Fast Gradient Sign MethodGoodfellow et al. proposed the Fast Gradient Sign Method (FGSM),a single shot attack L∞ attack that only requires onebackpropagation call. The adversarial example is given by

xadv = x + εsign(∇xJ(x, y))

where J is the loss function. The adversarial example is then clippedto a specified range.

Explaining and harnessing adversarial examples (Goodfellow et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 15 / 51

Page 16: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Basic Iterative Method

Kurakin et al. extended the FGSM into an iterative method.

xadv0 = x

xadvn+1 = clip(xadv

n + αsign(∇xJ(xadvn , y)))

Often more effective than FGSM

Adversarial examples in the physical world (Kurakin et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 16 / 51

Page 17: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

DeepFool

Fast iterative untargeted attack that produces adversarial examples bylinearizing the network to an affine multiclass classifier.

DeepFool: a simple and accurate method to fool deep neural networks (Moosavi-Dezfooli et al.)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 17 / 51

Page 18: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Carlini-Wagner Attack

Strong iterative targeted L2 attack that use gradient descent tominimize

||δ||2 + cf(x+ δ)

where

δ =1

2(tanh(w) + 1)− xi

f(x) = max(maxi 6=t{Z(x)i} − Z(x)t,−κ)

Has been shown to beat defensive distillation, which was believed tobe a robust defense against adversarial examples.

Very sensitive to hyper-parameter tuning.

Towards Evaluating the Robustness of Neural Networks (Carlini and Wagner)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 18 / 51

Page 19: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Problems and Challenges

Understanding adversarial examplesI Why are DNNs brittle?

Stronger attacksI Physical attacksI Black box attacks

DefenseI Denoising, affine tranformationsI Training a classifier to predict clean and adversarial examplesI Training / finetuning on adversarial examples (adversarial training)

DetectionI Classifying images as clean or adversarialI Locating adversarial patches

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 19 / 51

Page 20: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Other Problems

Adversarial training can be slow with stronger attacks

Iterative methods do not adjust the step size

In an attack setting, queries to a real world classifier may be limitedor costly

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 20 / 51

Page 21: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Table of Contents

1 Introduction

2 Background

3 Contributions

4 Trust Region Optimization

5 Proposed Method

6 Results

7 Conclusion

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 21 / 51

Page 22: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Contributions

The authors propose a white box targeted attack based on trustregion (TR) optimization.

Their method can adaptively choose the perturbation magnitude ateach iteration, which removes the need for expensive hyperparametertuning.

TR attack can produce perturbations faster than CW (up to 37.5×),and smaller in magnitude compared to DeepFool.

Their method can easily be extended to second-order TR attacks,which could be useful for nonlinear activation functions.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 22 / 51

Page 23: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Table of Contents

1 Introduction

2 Background

3 Contributions

4 Trust Region Optimization

5 Proposed Method

6 Results

7 Conclusion

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 23 / 51

Page 24: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Trust Region Optimization

Trust region methods are a classof iterative nonlinearoptimization algorithms

They are based around trustregions, which are (usually) ballsaround the current point inwhich a quadratic modelapproximation is used to find astep direction

https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 24 / 51

Page 25: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Trust Region Subproblem

Optimizing a function using a trust region method involves solvingtrust region subproblems

min mk(p) = fk + gkT p+

1

2pTBkp

s.t. ||p|| ≤ ∆k

where ∆k is the trust region radius, gk is the gradient at the currentpoint, and Bk is the Hessian.

The optimal p to the subproblem is called the Cauchy point.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 25 / 51

Page 26: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Updating the Trust Region

To update the size of the trust region at each iteration, we computethe following ratio

ρk =f(xk)− f(xk + pk)

mk(0)−mk(pk)

Based on the value of ρk and pk, we may choose to increase theradius, keep it the same, or decrease it

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 26 / 51

Page 27: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Trust region example - Initial Start

We will find the minimum of the Braninfunction. The Branin function has 3global minima.

min f(x1, x2) = (x2 − 0.129x12 +

1.6x1 − 6)2 + 6.07 cos(x1) + 10

The initial variables we define areI x = (6, 14)I ∆0 = 2,∆M = 5I t1 = 0.25, t2 = 2I η1 = 0.2, η2 = 0.75

Figure: Contour of Braninfunction

https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 27 / 51

Page 28: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Iteration 1

The algorithm starts at the green point,and the trust region is defined as thearea inside the circle centered at thestarting point.

After computing the Cauchy point ρk,we evaluate the ratio ρk, which cometout to be 0.99. Since ρk > η2, we takea full stepxk+1 = xk + pk = (5.767, 12.014), andwe increase the radius of the trustregion to ∆k = min(t2∆k,∆M ).

https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 28 / 51

Page 29: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Iteration 2

Starting with the new point and thelarger trust region, compute the Cauchypoint and ρk again to get ρk = 0.98.Thus, we take a full step, and the trustregion radius increases again.

https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 29 / 51

Page 30: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Iteration 3

This time, ρk = 0.578, which is notlarge enough to be trusted again. So westep again, but keep the radius thesame.

https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 30 / 51

Page 31: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Iteration 4

The new point is not a good prediction,and gives us ρk = −0.16. We do notstep, and we decrease the radius of thetrust region by a factor of t1 = 0.25.

https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 31 / 51

Page 32: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Iteration 5

We get ρk = 0.729, which is not largeenough. So we step forward and keepthe radius size the same.

https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 32 / 51

Page 33: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Iteration 6

In this case, ρk = 0.989. While this ishigh enough to update the radius,||pk|| 6= ∆k, so we step, but do notmake a full step, and we keep the radiusunchanged.

https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 33 / 51

Page 34: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Final Trajectory after 20 iterations

The algorithm terminates when thenorm of the gradient is close to 0, orwhen the difference between successfulpoints is close to 0.

https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 34 / 51

Page 35: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Table of Contents

1 Introduction

2 Background

3 Contributions

4 Trust Region Optimization

5 Proposed Method

6 Results

7 Conclusion

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 35 / 51

Page 36: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Proposed Method

To find adversarial perturbations within a trust region, the authorssolve

min||∆xj ||p

mj(∆xj) = 〈∆xj ,gjt,i〉+

1

2〈∆xj ,Hj

t,i∆xj〉

where εj is the TR radius at the jth iteration, mj is theapproximation of the kernel function of f(xj−1) = zj−1

t − zj−1i , with

gjt,i and Hj

t,i denoting the corresponding gradient and Hessian.

If a ReLU activation is used, the Hessian is zero almost everywhere, sothey can omit the Hessian, and do a first order approximation instead.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 36 / 51

Page 37: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Table of Contents

1 Introduction

2 Background

3 Contributions

4 Trust Region Optimization

5 Proposed Method

6 Results

7 Conclusion

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 37 / 51

Page 38: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Metrics

The authors evaluated the speed of their method by recording thenumber of seconds it takes to find an adversarial image.

To measure perturbation, the authors use relative perturbation. Therelative perturbation of an image is defined as

ρp =||∆x||p||x||p

.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 38 / 51

Page 39: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Types of attacks used

Two types of attacks are performed: best class attack and hardestclass. Best class attack means we target the class with

arg minj

zt − zj||∇x(xt − zj)||

.

Similarly for hardest class attack, we attack the class with

arg maxj

zt − zj||∇x(xt − zj)||

.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 39 / 51

Page 40: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Summary of setup

DatasetsI CIFAR10I ImageNet

AttacksI Iterative FGSM (only L∞)I DeepFoolI Carlini-Wagner (only L2)I TR Non-AdaptI TR Adap

NetworksI CIFAR10

F AlexLikeF AlexLike-S (AlexLike with

swiss activation)F ResNetF Wide ResNet

I ImageNetF AlexNetF VGG16F ResNet50F DenseNet121

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 40 / 51

Page 41: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Time Performance on ImageNet

TR and TR Adap produce similar perturbations as CW, but withsignificantly less time (up to 37.5×).

Figure 4: Perturbation magnitude vs. time

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 41 / 51

Page 42: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Qualitative Results

Figure 2: Attacks on VGG-16 with L∞ norm. TR perturbation is smallerthan DF (1.9× smaller)

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 42 / 51

Page 43: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

CIFAR10 Results

For all tables and models, the perturbation is chosen such that theaccuracy of the target model is reduced to less than 0.1%

Table 1: Best class attack.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 43 / 51

Page 44: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

CIFAR10 Results

Table 2: Hardest class attack

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 44 / 51

Page 45: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

ImageNet Results

Table 3: Best class attack

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 45 / 51

Page 46: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

ImageNet Results

Table 4: Hardest class attack

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 46 / 51

Page 47: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Second order attack results

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 47 / 51

Page 48: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Table of Contents

1 Introduction

2 Background

3 Contributions

4 Trust Region Optimization

5 Proposed Method

6 Results

7 Conclusion

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 48 / 51

Page 49: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

Conclusion

The authors propose a white box targeted attack called TR attack,which is based on trust region methods.

Their method can produce smaller adversarial perturbations veryquickly

TR attack can choose the perturbations at each iteration, and can beeasily extended to a second order attack.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 49 / 51

Page 50: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

References I

Yulong Cao, Chaowei Xiao, Dawei Yang, Jing Fang, Ruigang Yang, Mingyan Liu, and BoLi.Adversarial objects against lidar-based autonomous driving systems.arXiv preprint arXiv:1907.05418, 2019.

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao,Atul Prakash, Tadayoshi Kohno, and Dawn Song.Robust physical-world attacks on deep learning visual classification.In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy.Explaining and harnessing adversarial examples.In International Conference on Learning Representations, 2015.

Alexey Kurakin, Ian Goodfellow, and Samy Bengio.Adversarial examples in the physical world.arXiv preprint arXiv:1607.02533, 2016.

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, andAnanthram Swami.Practical black-box attacks against machine learning.Proceedings of the 2017 ACM on Asia Conference on Computer and CommunicationsSecurity - ASIA CCS ’17, 2017.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 50 / 51

Page 51: Trust Region Based Adversarial Attack on Neural NetworksFast Gradient Sign Method Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM), a single shot attack L 1attack that

References II

Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai.One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841, Oct 2019.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, and Rob Fergus.Intriguing properties of neural networks.In International Conference on Learning Representations, 2014.

Simen Thys, Wiebe Van Ranst, and Toon Goedeme.Fooling automated surveillance cameras: Adversarial patches to attack person detection.In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops,June 2019.

Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille.Adversarial examples for semantic segmentation and object detection.2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017.

Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 51 / 51