10
On the Convergence and Robustness of Adversarial Training Yisen Wang *1 Xingjun Ma *2 James Bailey 2 Jinfeng Yi 1 Bowen Zhou 1 Quanquan Gu 3 Abstract Improving the robustness of deep neural networks (DNNs) to adversarial examples is an important yet challenging problem for secure deep learning. Across existing defense techniques, adversarial training with Projected Gradient Decent (PGD) is amongst the most effective. Adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial exam- ples by maximizing the classification loss, and the outer minimization finding model parameters by minimizing the loss on adversarial examples gen- erated from the inner maximization. A criterion that measures how well the inner maximization is solved is therefore crucial for adversarial training. In this paper, we propose such a criterion, namely First-Order Stationary Condition for constrained optimization (FOSC), to quantitatively evaluate the convergence quality of adversarial examples found in the inner maximization. With FOSC, we find that to ensure better robustness, it is essential to use adversarial examples with better conver- gence quality at the later stages of training. Yet at the early stages, high convergence quality adver- sarial examples are not necessary and may even lead to poor robustness. Based on these obser- vations, we propose a dynamic training strategy to gradually increase the convergence quality of the generated adversarial examples, which sig- nificantly improves the robustness of adversarial training. Our theoretical and empirical results show the effectiveness of the proposed method. 1. Introduction Although deep neural networks (DNNs) have achieved great success in a number of fields such as computer vision (He et al., 2016) and natural language processing (Devlin et al., * Equal contribution 1 JD.com 2 The University of Melbourne 3 The University of California, Los Angeles. Correspondence to: Quanquan Gu <[email protected]>. Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). 2018), they are vulnerable to adversarial examples crafted by adding small, human imperceptible adversarial perturba- tions to normal examples (Szegedy et al., 2013; Goodfellow et al., 2015). Such vulnerability of DNNs raises security concerns about their practicability in security-sensitive ap- plications such as face recognition (Kurakin et al., 2016) and autonomous driving (Chen et al., 2015). Defense techniques that can improve DNN robustness to adversarial examples have thus become crucial for secure deep learning. There exist several defense techniques (i.e., “defense mod- els”), such as input denoising (Guo et al., 2018), gradient regularization (Papernot et al., 2017), and adversarial train- ing (Madry et al., 2018). However, many of these defense models provide either only marginal robustness or have been evaded by new attacks (Athalye et al., 2018). One defense model that demonstrates moderate robustness, and has thus far not been comprehensively attacked, is adversar- ial training (Athalye et al., 2018). Given a C-class dataset S = {(x 0 i ,y i )} n i=1 with x 0 i R d as a normal example in the d-dimensional input space and y i ∈{1, ··· ,C} as its associated label, the objective of adversarial training is to solve the following min-max optimization problem: min θ 1 n n X i=1 max kxi-x 0 i k (h θ (x i ),y i ), (1) where h θ : R d R C is the DNN function, x i is the ad- versarial example of x 0 i , (h θ (x i ),y i ) is the loss function on the adversarial example (x i ,y i ), and is the maximum perturbation constraint 1 . The inner maximization problem is to find an adversarial example x i within the -ball around a given normal example x 0 i (i.e., kx i - x 0 i k ) that maximizes the classification loss . It is typically noncon- cave with respect to the adversarial example. On the other hand, the outer minimization problem is to find model pa- rameters that minimize the loss on adversarial examples {x i } n i=1 that generated from the inner maximization. This is the problem of training a robust classifier on adversar- ial examples. Therefore, how well the inner maximization problem is solved directly affects the performance of the outer minimization, i.e., the robustness of the classifier. Several attack methods have been used to solve the inner 1 We only focus on the infinity norm constraint in this paper, but our algorithms and theory apply to other norms as well.

On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

Yisen Wang * 1 Xingjun Ma * 2 James Bailey 2 Jinfeng Yi 1 Bowen Zhou 1 Quanquan Gu 3

AbstractImproving the robustness of deep neural networks(DNNs) to adversarial examples is an importantyet challenging problem for secure deep learning.Across existing defense techniques, adversarialtraining with Projected Gradient Decent (PGD) isamongst the most effective. Adversarial trainingsolves a min-max optimization problem, with theinner maximization generating adversarial exam-ples by maximizing the classification loss, and theouter minimization finding model parameters byminimizing the loss on adversarial examples gen-erated from the inner maximization. A criterionthat measures how well the inner maximization issolved is therefore crucial for adversarial training.In this paper, we propose such a criterion, namelyFirst-Order Stationary Condition for constrainedoptimization (FOSC), to quantitatively evaluatethe convergence quality of adversarial examplesfound in the inner maximization. With FOSC, wefind that to ensure better robustness, it is essentialto use adversarial examples with better conver-gence quality at the later stages of training. Yet atthe early stages, high convergence quality adver-sarial examples are not necessary and may evenlead to poor robustness. Based on these obser-vations, we propose a dynamic training strategyto gradually increase the convergence quality ofthe generated adversarial examples, which sig-nificantly improves the robustness of adversarialtraining. Our theoretical and empirical resultsshow the effectiveness of the proposed method.

1. IntroductionAlthough deep neural networks (DNNs) have achieved greatsuccess in a number of fields such as computer vision (Heet al., 2016) and natural language processing (Devlin et al.,

*Equal contribution 1JD.com 2The University of Melbourne3The University of California, Los Angeles. Correspondence to:Quanquan Gu <[email protected]>.

Proceedings of the 36 th International Conference on MachineLearning, Long Beach, California, PMLR 97, 2019. Copyright2019 by the author(s).

2018), they are vulnerable to adversarial examples craftedby adding small, human imperceptible adversarial perturba-tions to normal examples (Szegedy et al., 2013; Goodfellowet al., 2015). Such vulnerability of DNNs raises securityconcerns about their practicability in security-sensitive ap-plications such as face recognition (Kurakin et al., 2016) andautonomous driving (Chen et al., 2015). Defense techniquesthat can improve DNN robustness to adversarial exampleshave thus become crucial for secure deep learning.

There exist several defense techniques (i.e., “defense mod-els”), such as input denoising (Guo et al., 2018), gradientregularization (Papernot et al., 2017), and adversarial train-ing (Madry et al., 2018). However, many of these defensemodels provide either only marginal robustness or havebeen evaded by new attacks (Athalye et al., 2018). Onedefense model that demonstrates moderate robustness, andhas thus far not been comprehensively attacked, is adversar-ial training (Athalye et al., 2018). Given a C-class datasetS = {(x0

i , yi)}ni=1 with x0i ∈ Rd as a normal example in

the d-dimensional input space and yi ∈ {1, · · · , C} as itsassociated label, the objective of adversarial training is tosolve the following min-max optimization problem:

minθ

1

n

n∑i=1

max‖xi−x0

i ‖∞≤ε`(hθ(xi), yi), (1)

where hθ : Rd → RC is the DNN function, xi is the ad-versarial example of x0

i , `(hθ(xi), yi) is the loss functionon the adversarial example (xi, yi), and ε is the maximumperturbation constraint1. The inner maximization problemis to find an adversarial example xi within the ε-ball arounda given normal example x0

i (i.e., ‖xi − x0i ‖∞ ≤ ε) that

maximizes the classification loss `. It is typically noncon-cave with respect to the adversarial example. On the otherhand, the outer minimization problem is to find model pa-rameters that minimize the loss ` on adversarial examples{xi}ni=1 that generated from the inner maximization. Thisis the problem of training a robust classifier on adversar-ial examples. Therefore, how well the inner maximizationproblem is solved directly affects the performance of theouter minimization, i.e., the robustness of the classifier.

Several attack methods have been used to solve the inner1We only focus on the infinity norm constraint in this paper,

but our algorithms and theory apply to other norms as well.

Page 2: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

maximization problem, such as Fast Gradient Sign Method(FGSM) (Goodfellow et al., 2015) and Projected GradientDescent (PGD) (Madry et al., 2018). However, the degreeto which they solve the inner maximization problem has notbeen thoroughly studied. Without an appropriate criterionto measure how well the inner maximization is solved, theadversarial training procedure is difficult to monitor or im-prove. In this paper, we propose such a criterion, namelyFirst-Order Stationary Condition for constrained optimiza-tion (FOSC), to measure the convergence quality of theadversarial examples found in the inner maximization. Ourproposed FOSC facilitates monitoring and understandingadversarial training through the lens of convergence qualityof the inner maximization, and this in turn motivates us topropose an improved training strategy for better robustness.Our main contributions are as follows:

• We propose a principled criterion FOSC to measure theconvergence quality of adversarial examples found inthe inner maximization problem of adversarial training.It is well-correlated with the adversarial strength ofadversarial examples, and is also a good indicator ofthe robustness of adversarial training.

• With FOSC, we find that better robustness of adversar-ial training is associated with training on adversarialexamples with better convergence quality in the laterstages. However, in the early stages, high convergencequality adversarial examples are not necessary and caneven be harmful.

• We propose a dynamic training strategy to graduallyincrease the convergence quality of the generated ad-versarial examples and provide a theoretical guaranteeon the overall (min-max) convergence. Experimentsshow that dynamic strategy significantly improves therobustness of adversarial training.

2. Related Work2.1. Adversarial Attack

Given a normal example (x0i , yi) and a DNN hθ , the goal of

an attacking method is to find an adversarial example xi thatremains in the ε-ball centered at x0 (‖xi − x0

i ‖∞ ≤ ε) butcan fool the DNN to make an incorrect prediction (hθ(xi) 6=yi). A wide range of attacking methods have been proposedfor the crafting of adversarial examples. Here, we onlymention a selection.

Fast Gradient Sign Method (FGSM). FGSM perturbs nor-mal examples x0 for one step (x1) by the amount ε alongthe gradient direction (Goodfellow et al., 2015):

x1 = x0 + ε · sign(∇x`(hθ(x0), y)). (2)

Projected Gradient Descent (PGD). PGD perturbs normalexample x0 for a number of steps K with smaller step size.After each step of perturbation, PGD projects the adversarialexample back onto the ε-ball of x0, if it goes beyond theε-ball (Madry et al., 2018):

xk = Π(xk−1 + α · sign(∇x`(hθ(xk−1), y))

), (3)

where α is the step size, Π(·) is the projection function, andxk is the adversarial example at the k-th step.

There are also other types of attacking methods, e.g.,Jacobian-based Saliency Map Attack (JSMA) (Papernotet al., 2016a), C&W attack (Carlini & Wagner, 2017) andFrank-Wolfe based attack (Chen et al., 2018). PGD isregarded as the strongest first-order attack, and C&W isamong the strongest attacks to date.

2.2. Adversarial Defense

A number of defense models have been developed suchas defensive distillation (Papernot et al., 2016b), featureanalysis (Xu et al., 2017; Ma et al., 2018), input denois-ing (Guo et al., 2018; Liao et al., 2018; Samangouei et al.,2018), gradient regularization (Gu & Rigazio, 2014; Paper-not et al., 2017; Tramer et al., 2018; Ross & Doshi-Velez,2018), model compression (Liu et al., 2018; Das et al., 2018;Rakin et al., 2018) and adversarial training (Goodfellowet al., 2015; Nøkland, 2015; Madry et al., 2018), amongwhich adversarial training is the most effective.

Adversarial training improves the model robustness by train-ing on adversarial examples generated by FGSM and PGD(Goodfellow et al., 2015; Madry et al., 2018). Tramer et al.(2018) proposed an ensemble adversarial training on ad-versarial examples generated from a number of pretrainedmodels. Kolter & Wong (2018) developed a provable robustmodel that minimizes worst-case loss over a convex outerregion. In a recent study by (Athalye et al., 2018), adversar-ial training on PGD adversarial examples was demonstratedto be the state-of-of-art defense model. Several improve-ments of PGD adversarial training have also been proposed,such as Lipschitz regularization (Cisse et al., 2017; Hein &Andriushchenko, 2017; Yan et al., 2018; Farnia et al., 2019),and curriculum adversarial training (Cai et al., 2018).

Despite these studies, a deeper understanding of adversarialtraining and a clear direction for further improvements islargely missing. The inner maximization problem in Eq. (1)lacks an effective criterion that can quantitatively measurethe convergence quality of training adversarial examplesgenerated by different attacking methods (which in turn in-fluences the analysis of the whole min-max problem). In thispaper, we propose such a criterion and provide new under-standing of the robustness of adversarial training. We designa dynamic training strategy that significantly improves therobustness of the standard PGD adversarial training.

Page 3: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

3. Evaluation of the Inner Maximization3.1. Quantitative Criterion: FOSC

In Eq. (1), the inner maximization problem is a constrainedoptimization problem, and is in general globally noncon-cave. Since the gradient norm of h is not an appropriatecriterion for nonconvex/nonconcave constrained optimiza-tion problems, inspired by Frank-Wolfe gap (Frank & Wolfe,1956), we propose a First-Order Stationary Condition forconstrained optimization (FOSC) as the convergence crite-rion for the inner maximization problem, which is affineinvariant and not tied to any specific choice of norm:

c(xk) = maxx∈X〈x− xk,∇xf(θ,xk)〉, (4)

where X = {x|‖x− x0‖∞ ≤ ε} is the input domain of theε-ball around normal example x0, f(θ,xk) = `(hθ(xk), y)and 〈·〉 is the inner product. Note that c(xk) ≥ 0, and asmaller value of c(xk) indicates a better solution of the innermaximization (or equivalently, better convergence qualityof the adversarial example xk).

The criterion FOSC in Eq. (4) can be shown to have thefollowing closed-form solution:

c(xk) = maxx∈X〈x− xk,∇xf(θ,xk)〉

= maxx∈X〈x− x0 + x0 − xk,∇xf(θ,xk)〉

= maxx∈X〈x− x0,∇xf(θ,xk)〉

+ 〈xk − x0,−∇xf(θ,xk)〉= ε‖∇xf(θ,xk)‖1 − 〈xk − x0,∇xf(θ,xk)〉.

As an example-wise criterion, c(xk) measures the conver-gence quality of adversarial example xk with respect to boththe perturbation constraint and the loss function. Optimalconvergence where c(xk) = 0 can be achieved when 1)∇f(θ,xk) = 0, i.e., xk is a stationary point in the interiorof X ; or 2) xk − x0 = ε · sign(∇f(θ,xk)), that is, localmaximum point of f(θ,xk) is reached on the boundary ofX . The proposed criterion FOSC allows the monitoring ofconvergence quality of the inner maximization problem, andprovides a new perspective of adversarial training.

3.2. FOSC View of Adversarial Training

In this subsection, we will use FOSC to investigate therobustness and learning process of adversarial training. Firstthough, we investigate its correlation with the traditionalmeasures of accuracy and loss.

FOSC View of Adversarial Strength. We train an 8-layerConvolutional Neural Network (CNN) on CIFAR-10 using10-step PGD (PGD-10) with step size ε/4, maximum pertur-bation ε = 8/255, following the standard setting in Madry

0.02 0.04 0.06 0.08 0.10FOSC

0.415

0.420

0.425

0.430

0.435

0.440

0.445

Accuracy

Accuracy

1.49

1.50

1.51

1.52

1.53

1.54

1.55

LossLoss

(a) Accuracy, Loss vs. FOSC

2 4 6 8 10PGD perturbation step

0.0

0.2

0.4

0.6

0.8

FOSC

FOSC

0

1

2

3

4

5

Loss

Loss

(b) FOSC, Loss vs. Step

Figure 1. The correlation between convergence quality (FOSC)and adversarial strength (accuracy and loss). (a): For PGD-10CIFAR-10 adversarial examples: the lower the FOSC (x-axis),the lower the accuracy (left y-axis) and the higher the loss (righty-axis). (b): For 20 randomly selected adversarial examples (eachline is an example): PGD perturbation step (x-axis) versus FOSC(left y-axis) and loss (right y-axis).

et al. (2018). We then apply the same PGD-10 attack onCIFAR-10 test images to craft adversarial examples, anddivide the crafted adversarial examples into 20 consecutivegroups of different convergence levels of FOSC value rang-ing from 0.0 to 0.1. The test accuracy and average loss ofadversarial examples in each group are in Figure 1a. Weobserve FOSC has a linear correlation with both accuracyand loss: the lower the FOSC, the lower (resp. higher) theaccuracy (resp. loss).

We further show the intermediate perturbation steps of PGDfor 20 randomly selected adversarial examples in Figure 1b.As perturbation step increases, FOSC decreases consistentlytowards 0, while loss increases and stabilizes at a muchwider range of values. Compared to the loss, FOSC providesa comparable and consistent measurement of adversarialstrength: the closer the FOSC to 0, the stronger the attack.

In summary, the proposed FOSC is well correlated with theadversarial strength and also more consistent than the loss,making it a promising tool to monitor adversarial training.

FOSC View of Adversarial Robustness. We first investi-gate the correlation among the final robustness of adversarialtraining, loss, and FOSC. In particular, we evaluate PGDadversarial training on CIFAR-10 in two settings: 1) varyingPGD step size from ε, ε/2 to ε/8 while fixing step numberas 20, and 2) varying PGD step number from 10 to 40 whilefixing step size as ε/6. In each setting, we cross test (white-box) the robustness of the final model against PGD attacksin the same setting on CIFAR-10 test images. For eachdefense model, we also compute the distributions of FOSCand loss (using Gaussian kernel density estimation (Parzen,1962)) for the last epoch generated adversarial examples.

As shown in Figure 2, when varying step size, the bestrobustness against all test attacks is observed for PGD-ε/2or PGD-ε/4 (Figure 2a), of which the FOSC distributionsare more concentrated around 0 (Figure 2c) but their lossdistributions are almost the same (Figure 2e). When varyingstep number, the final robustness values are very similar

Page 4: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

ε ε/2 ε/4 ε/6 ε/8Test attack step size

39

40

41

42

43

44

45

46Ro

bust

ness

(%)

Train: PGD-εTrain: PGD-ε/2Train: PGD-ε/4Train: PGD-ε/6Train: PGD-ε/8

(a) Robustness vs. Step size

10 20 30 40Test attack step

39

40

41

42

43

44

45

46

Robu

stne

ss (%

)

Train: PGD-10Train: PGD-20Train: PGD-30Train: PGD-40

(b) Robustness vs. Step number

0.0 0.1 0.2 0.3 0.4 0.5FOSC

0

5

10

15

20

25

30

35

Kern

el d

ensit

y (%

) Train: PGD-εTrain: PGD-ε/2Train: PGD-ε/4Train: PGD-ε/6Train: PGD-ε/8

(c) FOSC vs. Step size

0.0 0.1 0.2 0.3 0.4 0.5FOSC

0

5

10

15

20

25

30

35Ke

rnel

den

sity

(%) Train: PGD-10

Train: PGD-20Train: PGD-30Train: PGD-40

(d) FOSC vs. Step number

0 2 4 6 8Loss

0.0

0.1

0.2

0.3

0.4

Kern

el d

ensit

y (%

) Train: PGD-εTrain: PGD-ε/2Train: PGD-ε/4Train: PGD-ε/6Train: PGD-ε/8

(e) Loss vs. Step size

0 2 4 6 8Loss

0.0

0.1

0.2

0.3

0.4

Kern

el d

ensit

y (%

) Train: PGD-10Train: PGD-20Train: PGD-30Train: PGD-40

(f) Loss vs. Step number

Figure 2. Robustness of PGD adversarial training with (a) varyingstep size (fixed step number 20), or (b) varying step number (fixedstep size ε/6). The FOSC distributions (c)/(d) reflect the robustnessof adversarial training in (a)/(b), i.e., the lower the FOSC, the betterthe robustness. The loss distributions (e)/(f) are almost the samefor different settings.

(Figure 2b), which is also reflected by the similar FOSCdistributions (Figure 2d), but the loss distributions are againalmost the same (Figure 2f). Revisiting Figure 2b wherethe step size is ε/6, it is notable that increasing PGD stepsonly brings marginal or no robustness gain when the stepsare more than sufficient to reach the surface of the ε-ball:12 steps of ε/6 perturbation following the same gradientdirection can reach the surface of the ε-ball from any startingpoint. The above observations indicate that FOSC is a morereliable indicator of the final robustness of PGD adversarialtraining, compared to the loss.

Rethinking the Adversarial Training Process. To pro-vide more insights into the learning process of adversarialtraining, we show the distributions of FOSC at three distinctlearning stages in Figure 3: 1) early stage (epoch 10), 2)middle stage (epoch 60), and 3) later stage (epoch 100) (120epochs in total). We only focus on two defense models:training with 10-step PGD-ε/4 and 10-step PGD-ε/8 (thebest and worst model observed in Figure 2a respectively).

0.0 0.1 0.2 0.3 0.4 0.5FOSC

0

10

20

30

40

Ker

nel d

ensi

ty (%

)

PGD-ε/4: Epoch 10PGD-ε/4: Epoch 60PGD-ε/4: Epoch 100PGD-ε/8: Epoch 10PGD-ε/8: Epoch 60PGD-ε/8: Epoch 100

(a) FOSC

0 25 50 75 100 125Epoch

25

30

35

40

45

Robu

stne

ss (%

)

Train: PGDTrain: FGSM-PGD

(b) Robustness

0.0 0.1 0.2 0.3 0.4 0.5FOSC

0

10

20

30

40

Ker

nel d

ensi

ty (%

)

FGSM-PGD: Epoch 10FGSM-PGD: Epoch 60FGSM-PGD: Epoch 100

(c) FOSC

Figure 3. (a): FOSC distribution at intermediate epochs (10, 60,100) for adversarial training of 10 steps PGD with step size ε/4(PGD-ε/4) and step size ε/8 (PGD-ε/8); (b): The robustness oftraining with PGD and training with first FGSM then PGD; (c):FOSC distribution at intermediate epochs for training with firstFGSM then PGD. Distributions at the 60-th and 100-th epochsoverlap each other for (a)/(c).

In Figure 3a, for both models, FOSC at the early stage issignificantly lower than the following two stages. Thus, atthe early stage, both models can easily find high conver-gence quality adversarial examples for training; however, itbecomes more difficult to do so at the following stages. Thissuggests overfitting to strong PGD adversarial examples atthe early stage. To verify this, we replace the first 20 epochsof PGD-ε/4 training with a much weaker FGSM (1 stepperturbation of size ε), denoted as “FGSM-PGD”, and showits robustness and FOSC distribution in Figure 3b and 3crespectively. We find that by simply using weaker FGSMadversarial examples at the early stage, the final robustnessand the convergence quality of adversarial examples foundby PGD at the later stage are both significantly improved.The FOSC density between [0, 0.1] is improved to above35% (green solid line in Figure 3c) from less than 30%(green solid line in Figure 3a). This indicates strong PGDattacks are not necessary for the early stage of training, oreven deteriorate the robustness. In the next section, we willpropose a dynamic training strategy to address this issue.

4. Dynamic Adversarial TrainingIn this section, we first introduce the proposed dynamicadversarial training strategy. Then we provide a theoreticalconvergence analysis of the min-max problem in Eq. (1) ofthe proposed approach.

4.1. The Proposed Dynamic Training Strategy

As mentioned in Section 3.2, training on adversarial exam-ples of better convergence quality at the later stages leads tohigher robustness. However, at the early stages, training onhigh convergence quality adversarial examples may not behelpful. Recalling the criterion FOSC proposed in Section3.1, we have seen that it is strongly correlated with adver-sarial strength. Thus, it can be used to monitor the strengthof adversarial examples at a fine-grained level. Therefore,we propose to train DNNs with adversarial examples ofgradually decreasing FOSC value (increasing convergence

Page 5: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

quality), so as to ensure that the network is trained on weakadversarial examples at the early stages and strong adversar-ial examples at the later stages.

Our proposed dynamic adversarial training algorithm isshown in Algorithm 1. The dynamic criterion FOSCct = max(cmax − t · cmax/T

′, 0) controls the minimumFOSC value (maximum adversarial strength) of the adver-sarial examples at the t-th epoch of training (T ′ is slightlysmaller than total epochs T to ensure the later stage canbe trained on criterion 0). In the early stages of training,ct is close to the maximum FOSC value cmax correspond-ing to weak adversarial examples, it then decreases linearlytowards zero as training progresses2, and is zero after theT ′-th epoch of training. We use PGD to generate the train-ing adversarial examples, however, at each perturbation stepof PGD, we monitor the FOSC value and stop the pertur-bation process for adversarial example whose FOSC valueis already smaller than ct, enabled by an indicator controlvector V . cmax can be estimated by the average FOSC valueon a batch of weak adversarial examples such as FGSM.

Algorithm 1 Dynamic Adversarial Training

Input: Network hθ, training data S, initial model pa-rameters θ0, step size ηt, mini-batch B, maximum FOSCvalue cmax, training epochs T , FOSC control epoch T ′,PGD step K, PGD step size α, maximum perturbation ε.for t = 0 to T − 1 doct = max(cmax − t · cmax/T

′, 0)for each batch x0

B doV = 1B # control vector of all elements is 1while

∑V > 0 & k < K do

xk+1B = xkB + V · α · sign(∇x`(hθ(xkB), y))

xkB = clip(xkB,x0B − ε,x0

B + ε)V = 1B(c(xk1···B) ≤ ct) # The element of Vbecomes 0 at which FOSC is smaller than ct

end whileθt+1 = θt−ηtg(θt) # g(θt) : stochastic gradient

end forend for

4.2. Convergence Analysis

We provide a convergence analysis of our proposed dynamicadversarial training approach (as opposed to just the innermaximization problem) for solving the overall min-maxoptimization problem in Eq. (1). Due to the nonlineari-ties in DNNs such as ReLU (Nair & Hinton, 2010) andmax-pooling functions, the exact assumptions of Danskin’stheorem (Danskin, 2012) do not hold. Nevertheless, giventhe criterion FOSC that ensures an approximate maximizerof the inner maximization problem, we can still provide a

2This is only a simple strategy that works well in our experi-ments and other strategies could also work here.

theoretical convergence guarantee.

In detail, let x∗i (θ) = argmaxxi∈Xif(θ,xi) where

f(θ,x) = `(hθ(x), y) is a shorthand notation for the clas-sification loss function, Xi = {x|‖x − x0

i ‖∞ ≤ ε}, andfi(θ) = maxxi∈Xi f(θ,xi) = f(θ,x∗i (θ)), then xi(θ) isa δ-approximate solution to x∗i (θ), if it satisfies that

c(xi(θ)) = maxx∈Xi

〈x− xi(θ),∇xf(θ, xi(θ))〉 ≤ δ. (5)

In addition, denote the objective function in Eq.(1) by LS(θ), and its gradient by ∇LS(θ) =1/n

∑ni=1∇fi(θ) = 1/n

∑ni=1∇θf(θ,x∗i (θ)). Let

g(θ) = 1/|B|∑i∈B∇fi(θ) be the stochastic gradient of

LS(θ), where B is the mini-batch. We have E[g(θ)] =∇LS(θ). Let ∇θf(θ, x(θ)) be the gradient of f(θ, x(θ))with respect to θ, and g(θ) = 1/|B|

∑i∈B∇θf(θ, xi(θ))

be the approximate stochastic gradient of LS(θ).

Before we provide the convergence analysis, we first lay outa few assumptions that are needed for our analysis.

Assumption 1. The function f(θ;x) satisfies the gradientLipschitz conditions as follows

supx‖∇θf(θ,x)−∇θf(θ′,x)‖2 ≤ Lθθ‖θ − θ′‖2

supθ‖∇θf(θ,x)−∇θf(θ,x′)‖2 ≤ Lθx‖x− x′‖2

supx‖∇xf(θ,x)−∇xf(θ′,x)‖2 ≤ Lxθ‖θ − θ′‖2,

where Lθθ, Lθx, Lxθ are positive constants.

Assumption 1 was made in Sinha et al. (2018), which re-quires the loss function is smooth in the first and secondarguments. While ReLU (Nair & Hinton, 2010) is non-differentiable, recent studies (Allen-Zhu et al., 2018; Duet al., 2018; Zou et al., 2018; Cao & Gu, 2019) showed thatthe loss function of overparamterized deep neural networksis semi-smooth. This helps justify Assumption 1.

Assumption 2. f(θ,x) is locally µ-strongly concave inXi = {x : ‖x − x0

i ‖∞ ≤ ε} for all i ∈ [n], i.e., for anyx1,x2 ∈ Xi, it holds that

f(θ,x1) ≤ f(θ,x2) + 〈∇xf(θ,x2),x1 − x2〉 − µ2 ‖x1 − x2‖22.

Assumption 2 can be verified using the relation betweenrobust optimization and distributional robust optimization(refer to Sinha et al. (2018); Lee & Raginsky (2018)).

Assumption 3. The variance of the stochastic gradient g(θ)is bounded by a constant σ2 > 0,

E[‖g(θ)−∇LS(θ)‖22] ≤ σ2,

where ∇LS(θ) is the full gradient.

Assumption 3 is a common assumption made for the analy-sis of stochastic gradient based optimization algorithms.

Page 6: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

Theorem 1. Suppose Assumptions 1, 2 and 3 hold. Let∆ = LS(θ0) − minθ LS(θ). If the step size of the outerminimization is set to ηt = η = min(1/L,

√∆/Lσ2T ).

Then the output of Algorithm 1 satisfies

1

T

T−1∑t=0

E[‖∇LS(θt)‖22

]≤ 4σ

√L∆

T+

5L2θxδ

µ,

where L = (LθxLxθ/µ+ Lθθ).

The complete proof can be found in the supplementary ma-terial. Theorem 1 suggests that if the inner maximization issolved up to a precision so that the criterion FOSC is lessthan δ, Algorithm 1 can converge to a first-order station-ary point at a sublinear rate up to a precision of 5L2

θxδ/µ.In practice, if δ is sufficiently small such that 5L2

θxδ/µ issmall enough, Algorithm 1 can find a robust model θT . Thissupports the validity of Algorithm 1.

4.3. Relation to Curriculum Learning

Curriculum learning (Bengio et al., 2009) is a learningparadigm in which a model learns from easy examples firstthen gradually learns from more and more difficult examples.For training with normal examples, it has been shown to beable to speed up convergence and improve generalization.This methodology has been adopted in many applicationsto enhance model training (Kumar et al., 2010; Jiang et al.,2015). The main challenge for curriculum learning is to de-fine a proper criterion to determine the difficulty/hardness oftraining examples, so as to design a learning curriculum (i.e.,a sequential ordering) mechanism. Our proposed criterionFOSC in Section 3.1 can serve as such a difficulty measurefor training examples, and our proposed dynamic approachcan be regarded as one type of curriculum learning.

Curriculum learning was used in adversarial training in Caiet al. (2018), with the perturbation step of PGD as the diffi-culty measure. Their assumption is that more perturbationsteps indicate stronger adversarial examples. However, thisis not a reliable assumption from the FOSC view of theinner maximization problem: more steps may overshootand result in suboptimal adversarial examples. Empiricalcomparisons with (Cai et al., 2018) will be shown in Sec. 5.

5. ExperimentsIn this section, we evaluate the robustness of our proposedtraining strategy (Dynamic) compared with several state-of-the-art defense models, in both the white-box and black-box settings on benchmark datasets MNIST and CIFAR-10.We also provide analysis and insights on the robustness ofdifferent defense models. For all adversarial examples, weadopt the infinity norm ball as the maximum perturbationconstraint (Madry et al., 2018).

Baselines. The baseline defense models we use include1) Unsecured: unsecured training on normal examples; 2)Standard: standard adversarial training with PGD attacks(Madry et al., 2018); 3) Curriculum: curriculum adversar-ial training with PGD attacks of gradually increasing thenumber of perturbation steps (Cai et al., 2018).

5.1. Robustness Evaluation

Defense Settings. For MNIST, defense models use a 4-layer CNN: 2 convolutional layers followed by 2 denselayers. Batch normalization (BatchNorm) (Ioffe & Szegedy,2015) and max-pooling (MaxPool) are applied after eachconvolutional layer. For CIFAR-10, defense models adoptan 8-layer CNN architecture: 3 convolutional blocks fol-lowed by 2 dense layers, with each convolutional block has2 convolutional layers. BatchNorm is applied after eachconvolutional layer, and MaxPool is applied after everyconvolutional block. Defense models for both MNIST andCIFAR-10 are trained using Stochastic Gradient Descent(SGD) with momentum 0.9, weight decay 10−4 and an ini-tial learning rate of 0.01. The learning rate is divided by10 at the 20-th and 40-th epoch for MNIST (50 epochs intotal), and at the 60-th and 100-th epoch for CIFAR-10 (120epochs in total). All images are normalized into [0, 1].

Except the Unsecured model, other defense models includ-ing our proposed Dynamic model are all trained under thesame PGD adversarial training scheme: 10-step PGD attackwith random start (adding an initial random perturbationof [−ε, ε] to the normal examples before the PGD pertur-bation) and step size ε/4. The maximum perturbation isset to ε = 0.3 for MNIST, and ε = 8/255 for CIFAR-10,which is a standard setting for adversarial defense (Athalyeet al., 2018; Madry et al., 2018). For Dynamic model, we setcmax = 0.5 for both MNIST and CIFAR-10, and T ′ = 40for MNIST and T ′ = 100 for CIFAR-10. Other parametersof the baselines are configured as per their original papers.

White-box Robustness. For MNIST and CIFAR-10, theattacks used for white-box setting are generated from theoriginal test set images by attacking the defense modelsusing 4 attacking methods: FGSM, PGD-10 (10-step PGD),PGD-20 (20-step PGD), and C&W∞ (L∞ version of C&Woptimized by PGD for 30 steps). In the white-box setting, allattacking methods have full access to the defense model pa-rameters and are constrained by the same maximum pertur-bation ε. We report the classification accuracy of a defensemodel under white-box attacks as its white-box robustness.

The white-box results are reported in Table 1. On bothdatasets, the Unsecured model achieves the best test ac-curacy on clean (unperturbed) images. However, it is notrobust to adversarial examples — accuracy drops to zero onstrong attacks like PGD-10/20 or C&W∞. The proposed Dy-namic model almost achieves the best robustness among all

Page 7: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

Table 1. White-box robustness (accuracy (%) on white-box test attacks) of different defense models on MNIST and CIFAR-10 datasets.

DefenseMNIST CIFAR-10

Clean FGSM PGD-10 PGD-20 C&W∞ Clean FGSM PGD-10 PGD-20 C&W∞Unsecured 99.20 14.04 0.0 0.0 0.0 89.39 2.2 0.0 0.0 0.0Standard 97.61 94.71 91.21 90.62 91.03 66.31 48.65 44.39 40.02 36.33Curriculum 98.62 95.51 91.24 90.65 91.12 72.40 50.47 45.54 40.12 35.77Dynamic 97.96 95.34 91.63 91.27 91.47 72.17 52.81 48.06 42.40 37.26

Table 2. Black-box robustness (accuracy (%) on black-box test attacks) of different defense models on MNIST and CIFAR-10 datasets.

DefenseMNIST CIFAR-10

FGSM PGD-10 PGD-20 C&W∞ FGSM PGD-10 PGD-20 C&W∞Standard 96.12 95.73 95.73 97.20 65.65 65.80 65.60 66.12Curriculum 96.59 95.87 96.09 97.52 71.25 71.44 71.13 71.94Dynamic 97.60 97.01 96.97 98.36 71.95 72.15 72.02 72.85

the defense models. Comparing the robustness on MNISTand CIFAR-10, the improvements are more significant onCIFAR-10. This may because MNIST consisting of onlyblack-white digits is a relatively simple dataset where differ-ent defense models all work comparably well. Compared toStandard adversarial training, Dynamic training with con-vergence quality controlled adversarial examples improvesthe robustness to a certain extent, especially on the morechallenging CIFAR-10 with natural images. This robustnessgain is possibly limited by the capacity of the small model(only an 8-layer CNN). Thus we shortly show a series of ex-periments on WideResNet (Zagoruyko & Komodakis, 2016)where the power of the Dynamic strategy is fully unleashed.In Table 1, we see that Curriculum improves robustnessagainst weak attacks like FGSM but is less effective againststrong attacks like PGD/C&W∞.

Benchmarking the State-of-the-art on WideResNet. Toanalyze the full power of our proposed Dynamic trainingstrategy and also benchmark the state-of-the-art robustnesson CIFAR-10, we conduct experiments on a large capacitynetwork WideResNet (Zagoruyko & Komodakis, 2016) (10times wider than standard ResNet (He et al., 2016)), usingthe same settings as Madry et al. (2018). The WideRes-Net achieves an accuracy of 95.2% on clean test images ofCIFAR-10. For comparison, we include Madry’s WideRes-Net adversarial training and the Curriculum model. White-box robustness against FGSM, PGD-20 and C&W∞ at-tacks is shown in Table 3. Our proposed Dynamic modeldemonstrates a significant boost over Madry’s WideResNetadversarial training on FGSM and PGD-20, while Curricu-lum model only achieves slight gains respectively. For thestrongest attack C&W∞, Curriculum’s robustness decreasesby 4% compared to Madry’s, while Dynamic model achieveshighest robustness.

Black-box Robustness. Black-box test attacks are gener-

Table 3. White-box robustness (%) of different defense models onCIFAR-10 dataset using WideResNet setting in Madry’s baselines.

Defense Clean FGSM PGD-20 C&W∞Madry’s 87.3 56.1 45.8 46.8Curriculum 77.43 57.17 46.06 42.28Dynamic 85.03 63.53 48.70 47.27

ated on the original test set images by attacking a surrogatemodel with architecture that is either a copy of the defensemodel (for MNIST) or a more complex ResNet-50 (He et al.,2016) model (for CIFAR-10). Both surrogate models aretrained separately from the defense models on the originaltraining sets using Standard adversarial training (10-stepPGD attack with a random start and step size ε/4). Theattacking methods used here are the same as the white-boxevaluation: FGSM, PGD-10, PGD-20, and C&W∞.

The robustness of different defense models against black-box attacks is reported in Table 2. Again, the proposedDynamic achieves higher robustness than the other defensemodels. Curriculum also demonstrates a clear improve-ment over Standard adversarial training. The distinctiverobustness boosts of Dynamic and Curriculum indicate thattraining with weak adversarial examples at the early stagecan improve the final robustness.

Compared with the white-box robustness in Table 1, alldefense models achieve higher robustness against black-box attacks, even the CIFAR-10 black-box attacks whichare generated based on a much more complex ResNet-50network (the defense network is only an 8-layer CNN). Thisimplies that black-box attacks are indeed less powerful thanwhite-box attacks, at least for the tested attacks. It is alsoobserved that robustness tends to increase from weak attackslike FGSM to stronger attacks like C&W∞. This impliesthat stronger attacks tend to have less transferability, anobservation which is consistent with Madry et al. (2018).

Page 8: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

5.2. Further Analysis

Different Maximum Perturbation Constraints ε. We an-alyze the robustness of defense models Standard, Curricu-lum and the proposed Dynamic, under different maximumperturbation constraints ε on CIFAR-10. For efficiency,we use the same 8-layer CNN defense architecture as inSec. 5.1. We see in Figure 4a the white-box robustness ofdefense models trained with ε = 8/255 against differentPGD-10 attacks with varying ε ∈ [2/255, 8/255]. Curricu-lum and Dynamic models substantially improve the robust-ness of Standard adversarial training, a result consistentwith Sec. 5.1. Dynamic training is better against strongerattacks with larger perturbations (ε = 8/255) than Cur-riculum. Curriculum is effective on attacks with smallerperturbations (ε = 4/255, 2/255), as similar performanceto Dynamic. We also train the defense models with differentε ∈ [2/255, 8/255], and then test their white-box robust-ness under the same ε (all defense models will tend to havesimilar low robustness if testing ε is larger than training ε).As illustrated in Figure 4b, training with weak attacks at theearly stages might have a limit: the robustness gain tendsto decrease when the maximum perturbation decreases toε = 2/255. This is not surprising given the fact that the in-ner maximization problem of adversarial training becomesmore concave and easier to solve given the smaller ε-ball.However, robustness for this extremely small scale pertur-bation is arguably less interesting for secure deep learning.

8/255 4/255 2/255ε for testing

0

10

20

30

40

50

60

Rob

ustn

ess

(%)

StandardCurriculumDynamic

(a) Robustness vs. testing ε

8/255 4/255 2/255ε for training

0

10

20

30

40

50

60

70

Rob

ustn

ess

(%)

StandardCurriculumDynamic

(b) Robustness vs. training εFigure 4. (a): White-box robustness on PGD-10 attacks with dif-ferent testing ε ∈ [2/255, 8/255]; (b): White-box robustnessof defense models trained on PGD-10 with different trainingε ∈ [2/255, 8/255].

Adversarial Training Process. To understand the learningdynamics of the 3 defense models, we plot the distribution ofFOSC at different training epochs in Figures 5. We chooseepoch 10/60/100 for Standard and Dynamic, and epoch60/90/120 for Curriculum as it trains without adversarialexamples at early epochs. We see both Curriculum andDynamic learn with adversarial examples that are of increas-ing convergence quality (decreasing FOSC). The differenceis that Dynamic has more precise control over the conver-gence quality due to its use of the proposed criterion FOSC,demonstrating more concentrated FOSC distributions whichare more separated at different stages of training. While for

0.0 0.1 0.2 0.3 0.4 0.5FOSC

0

10

20

30

40

Ker

nel d

ensi

ty (%

)

Standard: Epoch 10Standard: Epoch 60Standard: Epoch 100

(a) Standard

0.0 0.1 0.2 0.3 0.4 0.5FOSC

0

10

20

30

40

Ker

nel d

ensi

ty (%

)

Curriculum: Epoch 60Curriculum: Epoch 90Curriculum: Epoch 120

(b) Curriculum

0.0 0.1 0.2 0.3 0.4 0.5FOSC

0

10

20

30

40

Ker

nel d

ensi

ty (%

)

Dynamic: Epoch 10Dynamic: Epoch 60Dynamic: Epoch 100

(c) DynamicFigure 5. The distributions of FOSC at different epochs of trainingon CIFAR-10 with 10-step PGD of step size ε/4 and ε = 8/255.

Curriculum, the convergence quality of adversarial exam-ples generated by the same number of perturbation steps canspan a wide range of values (e.g. the flat blue line in Figure5b), having both weak and strong adversarial examples. Re-garding the later stages of training (epoch 100/120), we seeDynamic ends up with the best convergence quality (FOSCdensity over 40% in Figure 5c) followed by Curriculum(FOSC density over 30% in Figure 5b) and Standard (FOSCdensity less than 30% in Figure 5a), which is well alignedwith their final robustness reported in Tables 1 and 2.

6. Discussion and ConclusionIn this paper, we proposed a criterion, First-Order StationaryCondition for constrained optimization (FOSC), to measurethe convergence quality of adversarial examples found in theinner maximization of adversarial training. The proposedcriterion FOSC is well correlated with adversarial strengthand is more consistent than the loss. Based on FOSC, wefound that higher robustness of adversarial training can beachieved by training on better convergence quality adver-sarial examples at the later stages, rather than at the earlystages. Following that, we proposed a dynamic trainingstrategy and proved the convergence of the proposed ap-proach for the overall min-max optimization problem undercertain assumptions. On benchmark datasets, especially onCIFAR-10 under the WideResNet architecture for attackswith maximum perturbation constraint ε = 8/255, our pro-posed dynamic strategy achieved a significant robustnessgain against Madry’s state-of-the-art baselines.

Our findings imply that including very hard adversarial ex-amples too early in training possibly inhibits DNN featurelearning or encourages premature learning of overly com-plex features that provide less compression of patterns inthe data. Experimental evidences also suggest that the laterstages are more correlated with the final robustness, whilethe early stages are more associated with generalization.Therefore, we conjecture that higher robustness can be ob-tained by further increasing the diversity of weak adversarialexamples in the early stages or generating more powerfuladversarial examples in the later stages. The precise char-acterization of how the early and later stages interact witheach other is still an open problem. We believe further ex-ploration of this direction will lead to more robust models.

Page 9: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

AcknowledgementsWe would like to thank the anonymous reviewers for theirhelpful comments.

ReferencesAllen-Zhu, Z., Li, Y., and Song, Z. A convergence theory for

deep learning via over-parameterization. arXiv preprintarXiv:1811.03962, 2018.

Athalye, A., Carlini, N., and Wagner, D. Obfuscated gra-dients give a false sense of security: Circumventing de-fenses to adversarial examples. In ICML, 2018.

Bengio, Y., Louradour, J., Collobert, R., and Weston, J.Curriculum learning. In ICML, 2009.

Cai, Q.-Z., Du, M., Liu, C., and Song, D. Curriculumadversarial training. In IJCAI, 2018.

Cao, Y. and Gu, Q. A generalization theory of gradient de-scent for learning over-parameterized deep relu networks.arXiv preprint arXiv:1902.01384, 2019.

Carlini, N. and Wagner, D. Towards evaluating the robust-ness of neural networks. In IEEE Symposium on Securityand Privacy, 2017.

Chen, C., Seff, A., Kornhauser, A., and Xiao, J. Deepdriving:Learning affordance for direct perception in autonomousdriving. In CVPR, 2015.

Chen, J., Yi, J., and Gu, Q. A frank-wolfe framework forefficient and effective adversarial attacks. arXiv preprintarXiv:1811.10828, 2018.

Cisse, M., Bojanowski, P., Grave, E., Dauphin, Y., andUsunier, N. Parseval networks: Improving robustness toadversarial examples. In ICML, 2017.

Danskin, J. M. The theory of max-min and its applicationto weapons allocation problems, volume 5. SpringerScience & Business Media, 2012.

Das, N., Shanbhogue, M., Chen, S.-T., Hohman, F., Li, S.,Chen, L., Kounavis, M. E., and Chau, D. H. Compressionto the rescue: Defending from adversarial attacks acrossmodalities. In KDD, 2018.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert:Pre-training of deep bidirectional transformers for lan-guage understanding. arXiv preprint arXiv:1810.04805,2018.

Du, S. S., Lee, J. D., Li, H., Wang, L., and Zhai, X. Gradientdescent finds global minima of deep neural networks.arXiv preprint arXiv:1811.03804, 2018.

Farnia, F., Zhang, J., and Tse, D. Generalizable adversarialtraining via spectral normalization. In ICLR, 2019.

Frank, M. and Wolfe, P. An algorithm for quadratic pro-gramming. Naval Research Logistics Quarterly, 3(1-2):95–110, 1956.

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explainingand harnessing adversarial examples. In ICLR, 2015.

Gu, S. and Rigazio, L. Towards deep neural network archi-tectures robust to adversarial examples. arXiv preprintarXiv:1412.5068, 2014.

Guo, C., Rana, M., Cisse, M., and van der Maaten, L. Coun-tering adversarial images using input transformations. InICLR, 2018.

He, K., Zhang, X., Ren, S., and Sun, J. Deep residuallearning for image recognition. In CVPR, 2016.

Hein, M. and Andriushchenko, M. Formal guarantees on therobustness of a classifier against adversarial manipulation.In NeurIPS, 2017.

Ioffe, S. and Szegedy, C. Batch normalization: Acceleratingdeep network training by reducing internal covariate shift.In ICML, 2015.

Jiang, L., Meng, D., Zhao, Q., Shan, S., and Hauptmann,A. G. Self-paced curriculum learning. In AAAI, 2015.

Kolter, J. Z. and Wong, E. Provable defenses against adver-sarial examples via the convex outer adversarial polytope.In ICML, 2018.

Kumar, M. P., Packer, B., and Koller, D. Self-paced learningfor latent variable models. In NeurIPS, 2010.

Kurakin, A., Goodfellow, I., and Bengio, S. Adversar-ial examples in the physical world. arXiv preprintarXiv:1607.02533, 2016.

Lee, J. and Raginsky, M. Minimax statistical learning withwasserstein distances. In NeurIPS, 2018.

Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., and Zhu,J. Defense against adversarial attacks using high-levelrepresentation guided denoiser. In CVPR, 2018.

Liu, Q., Liu, T., Liu, Z., Wang, Y., Jin, Y., and Wen, W. Secu-rity analysis and enhancement of model compressed deeplearning systems under adversarial attacks. In ASPDAC,2018.

Ma, X., Li, B., Wang, Y., Erfani, S. M., Wijewickrema, S.,Schoenebeck, G., Song, D., Houle, M. E., and Bailey, J.Characterizing adversarial subspaces using local intrinsicdimensionality. arXiv preprint arXiv:1801.02613, 2018.

Page 10: On the Convergence and Robustness of Adversarial Trainingproceedings.mlr.press/v97/wang19i/wang19i.pdf · Andriushchenko,2017;Yan et al.,2018;Farnia et al.,2019), and curriculum adversarial

On the Convergence and Robustness of Adversarial Training

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., andVladu, A. Towards deep learning models resistant toadversarial attacks. In ICLR, 2018.

Nair, V. and Hinton, G. E. Rectified linear units improverestricted boltzmann machines. In ICML, 2010.

Nøkland, A. Improving back-propagation by adding anadversarial gradient. arXiv preprint arXiv:1510.04189,2015.

Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik,Z. B., and Swami, A. The limitations of deep learning inadversarial settings. In IEEE European Symposium onSecurity and Privacy, 2016a.

Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami,A. Distillation as a defense to adversarial perturbationsagainst deep neural networks. In IEEE Symposium onSecurity and Privacy, 2016b.

Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik,Z. B., and Swami, A. Practical black-box attacks againstmachine learning. In Asia CCS, 2017.

Parzen, E. On estimation of a probability density functionand mode. The Annals of Mathematical Statistics, 33(3):1065–1076, 1962.

Rakin, A. S., Yi, J., Gong, B., and Fan, D. Defend deep neu-ral networks against adversarial examples via fixed and-dynamic quantized activation functions. arXiv preprintarXiv:1807.06714, 2018.

Ross, A. S. and Doshi-Velez, F. Improving the adversarialrobustness and interpretability of deep neural networksby regularizing their input gradients. In AAAI, 2018.

Samangouei, P., Kabkab, M., and Chellappa, R. Defense-GAN: Protecting classifiers against adversarial attacksusing generative models. In ICLR, 2018.

Sinha, A., Namkoong, H., and Duchi, J. Certifiable distribu-tional robustness with principled adversarial training. InICLR, 2018.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,D., Goodfellow, I., and Fergus, R. Intriguing properties ofneural networks. arXiv preprint arXiv:1312.6199, 2013.

Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I.,Boneh, D., and McDaniel, P. Ensemble adversarial train-ing: Attacks and defenses. In ICLR, 2018.

Xu, W., Evans, D., and Qi, Y. Feature squeezing: Detectingadversarial examples in deep neural networks. arXivpreprint arXiv:1704.01155, 2017.

Yan, Z., Guo, Y., and Zhang, C. Deep defense: Trainingdnns with improved adversarial robustness. In NeurIPS,2018.

Zagoruyko, S. and Komodakis, N. Wide residual networks.arXiv preprint arXiv:1605.07146, 2016.

Zou, D., Cao, Y., Zhou, D., and Gu, Q. Stochastic gradientdescent optimizes over-parameterized deep relu networks.arXiv preprint arXiv:1811.08888, 2018.