12
Universal adversarial perturbations Seyed-Mohsen Moosavi-Dezfooli *† [email protected] Alhussein Fawzi *† [email protected] Omar Fawzi [email protected] Pascal Frossard [email protected] Abstract Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a sys- tematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi- imperceptible to the human eye. We further empirically an- alyze these universal perturbations and show, in particular, that they generalize very well across neural networks. The surprising existence of universal perturbations reveals im- portant geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines poten- tial security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images. 1. Introduction Can we find a single small image perturbation that fools a state-of-the-art deep neural network classifier on all nat- ural images? We show in this paper the existence of such quasi-imperceptible universal perturbation vectors that lead to misclassify natural images with high probability. Specif- ically, by adding such a quasi-imperceptible perturbation to natural images, the label estimated by the deep neu- ral network is changed with high probability (see Fig. 1). Such perturbations are dubbed universal, as they are image- agnostic. The existence of these perturbations is problem- atic when the classifier is deployed in real-world (and pos- sibly hostile) environments, as such a single perturbation can be exploited by adversaries to break the classifier. In- deed, the perturbation process involves the mere addition of * The first two authors contributed equally to this work. ´ Ecole Polytechnique F´ ed´ erale de Lausanne, Switzerland ENS de Lyon, LIP, UMR 5668 ENS Lyon - CNRS - UCBL - INRIA, Universit´ e de Lyon, France Joystick Whiptail lizard Balloon Lycaenid Tibetan mastiff Thresher Grille Flagpole Face powder Laborador Chihuahua Chihuahua Jay Laborador Laborador Tibetan mastiff Brabancon griffon Border terrier Figure 1: When added to a natural image, a universal per- turbation image causes the image to be misclassified by the deep neural network with high probability. Left images: Original natural images. The labels are shown on top of each arrow. Central image: Universal perturbation. Right images: Perturbed images. The estimated labels of the per- turbed images are shown on top of each arrow. 1 arXiv:1610.08401v1 [cs.CV] 26 Oct 2016

[email protected] arXiv:1610.08401v1 [cs.CV] 26 … · Universal adversarial perturbations Seyed-Mohsen Moosavi-Dezfooli y [email protected] Alhussein Fawzi [email protected]

  • Upload
    vuxuyen

  • View
    225

  • Download
    1

Embed Size (px)

Citation preview

Universal adversarial perturbations

Seyed-Mohsen Moosavi-Dezfooli∗†

[email protected]

Alhussein Fawzi∗†

[email protected]

Omar Fawzi‡

[email protected]

Pascal Frossard†

[email protected]

Abstract

Given a state-of-the-art deep neural network classifier,we show the existence of a universal (image-agnostic) andvery small perturbation vector that causes natural imagesto be misclassified with high probability. We propose a sys-tematic algorithm for computing universal perturbations,and show that state-of-the-art deep neural networks arehighly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye. We further empirically an-alyze these universal perturbations and show, in particular,that they generalize very well across neural networks. Thesurprising existence of universal perturbations reveals im-portant geometric correlations among the high-dimensionaldecision boundary of classifiers. It further outlines poten-tial security breaches with the existence of single directionsin the input space that adversaries can possibly exploit tobreak a classifier on most natural images.

1. Introduction

Can we find a single small image perturbation that foolsa state-of-the-art deep neural network classifier on all nat-ural images? We show in this paper the existence of suchquasi-imperceptible universal perturbation vectors that leadto misclassify natural images with high probability. Specif-ically, by adding such a quasi-imperceptible perturbationto natural images, the label estimated by the deep neu-ral network is changed with high probability (see Fig. 1).Such perturbations are dubbed universal, as they are image-agnostic. The existence of these perturbations is problem-atic when the classifier is deployed in real-world (and pos-sibly hostile) environments, as such a single perturbationcan be exploited by adversaries to break the classifier. In-deed, the perturbation process involves the mere addition of

∗The first two authors contributed equally to this work.†Ecole Polytechnique Federale de Lausanne, Switzerland‡ENS de Lyon, LIP, UMR 5668 ENS Lyon - CNRS - UCBL - INRIA,

Universite de Lyon, France

Joystick

Whiptail lizard

Balloon

Lycaenid

Tibetan masti�

Thresher

Grille

Flagpole

Face powder

Laborador

Chihuahua

Chihuahua

Jay

Laborador

Laborador

Tibetan masti�

Brabancon gri�on

Border terrier

Figure 1: When added to a natural image, a universal per-turbation image causes the image to be misclassified by thedeep neural network with high probability. Left images:Original natural images. The labels are shown on top ofeach arrow. Central image: Universal perturbation. Rightimages: Perturbed images. The estimated labels of the per-turbed images are shown on top of each arrow.

1

arX

iv:1

610.

0840

1v1

[cs

.CV

] 2

6 O

ct 2

016

one very small perturbation to all natural images, and canbe relatively straightforward to implement by adversaries inreal-world environments, while being relatively difficult todetect as such perturbations are very small and thus do notsignificantly affect data distributions. The surprising exis-tence of universal perturbations further reveals new insightson the topology of the decision boundaries of deep neuralnetworks. We summarize the main contributions of this pa-per as follows:

• We show the existence of universal image-agnosticperturbations for state-of-the-art deep neural networks.

• We propose an efficient algorithm for finding such per-turbations. The algorithm seeks a universal perturba-tion for a set of training points, and proceeds by aggre-gating atomic perturbation vectors that send successivedatapoints to the decision boundary of the classifier.

• We show that universal perturbations have a remark-able generalization property, as perturbations com-puted for a rather small set of training points fool newimages with high probability.

• We show that such perturbations are not only univer-sal across images, but also generalize well across deepneural networks. Such perturbations are therefore dou-bly universal, both with respect to the data and the net-work architectures.

• We explain and analyze the high vulnerability of deepneural networks to universal perturbations by examin-ing the geometric correlation between different partsof the decision boundary.

The robustness of image classifiers to structured and un-structured perturbations have recently attracted a lot of at-tention [19, 16, 20, 3, 5, 13, 14, 4]. Despite the impres-sive performance of deep neural network architectures onchallenging visual classification benchmarks [7, 10, 21, 11],these classifiers were shown to be highly vulnerable to per-turbations. In [19], such networks are shown to be unsta-ble to very small and often imperceptible additive adver-sarial perturbations. Such carefully crafted perturbationsare either estimated by solving an optimization problem[19, 12, 1] or through one step of gradient ascent [6], andresult in a perturbation that fools a specific data point. Afundamental property of these adversarial perturbations istheir intrinsic dependence on datapoints: the perturbationsare specifically crafted for each data point independently.As a result, the computation of an adversarial perturbationfor a new data point requires solving a data-dependent opti-mization problem from scratch, which uses the full knowl-edge of the classification model. This is different from theuniversal perturbation considered in this paper, as we seeka single perturbation vector that fools the network on most

natural images. Perturbing a new datapoint then only in-volves the mere addition of the universal perturbation to theimage (and does not require solving an optimization prob-lem/gradient computation). We emphasize that our notionof universal perturbation differs from the generalization ofadversarial perturbations studied in [19], where perturba-tions computed on the MNIST task were shown to gener-alize well across different neural network architectures. In-stead, we examine the existence of universal perturbationsthat are common to most data points belonging to the datadistribution.

2. Universal perturbationsWe formalize in this section the notion of universal per-

turbations, and propose a method for estimating such per-turbations. Let µ denote a distribution of images in Rd, andk define a classification function that outputs for each im-age x ∈ Rd an estimated label k(x). The main focus of thispaper is to seek perturbation vectors v ∈ Rd that fool theclassifier k on almost all datapoints sampled from µ. Thatis, we seek a vector v such that

k(x+ v) 6= k(x) for “most” x ∼ µ.

We coin such a perturbation universal, as it represents afixed image-agnostic perturbation that causes label changefor most images sampled from the data distribution µ. Wefocus here on the case where the distribution µ representsthe set of natural images, hence containing a huge amountof variability. In that context, we examine the existence ofvery small universal perturbations (in terms of the `p normwith p ∈ [1,∞)) that misclassify most images. The goalis therefore to find v that satisfies the following two con-straints:

1. ‖v‖p ≤ ξ,

2. Px∼µ

(k(x+ v) 6= k(x)

)≥ 1− δ.

The parameter ξ controls the magnitude of the perturbationvector v, and δ quantifies the desired fooling rate for allimages sampled from the distribution µ.

Algorithm. Let X = {x1, . . . , xm} be a set of imagessampled from the distribution µ. Our proposed algorithmseeks a universal perturbation v, such that ‖v‖p ≤ ξ, whilefooling most data points inX . The proposed algorithm pro-ceeds iteratively over the data points in X and graduallybuilds the universal perturbation, as illustrated in Fig. 2. Ateach iteration, the minimal perturbation ∆vi that sends thecurrent perturbed point, xi + v, to the decision boundaryof the classifier is computed, and aggregated to the currentinstance of the universal perturbation. In more details, pro-vided the current universal perturbation v does not fool data

point xi, we seek the extra perturbation ∆vi with minimalnorm that allows to fool data point xi by solving the follow-ing optimization problem:

∆vi ← arg minr‖r‖2 s.t. k(xi + v + r) 6= k(xi). (1)

To ensure that the constraint ‖v‖p ≤ ξ is satisfied, the up-dated universal perturbation is further projected on the `pball of radius ξ and centered at 0. That is, let Pp,ξ be theprojection operator defined as follows:

Pp,ξ(x0) = minx‖x− x0‖2 subject to ‖x0‖p ≤ ξ.

Then, our update rule is given by:

v ← Pp,ξ(v + ∆vi).

Several passes on the data set X are performed to improvethe quality of the universal perturbation. The algorithm isterminated when the empirical “fooling rate” on the per-turbed data set Xv := {x1 + v, . . . , xm + v} exceeds thetarget threshold 1− δ. That is, we stop the algorithm when-ever

Err(Xv) :=1

m

m∑i=1

1k(xi+v) 6=k(xi)≥ 1− δ.

The detailed algorithm is provided in Algorithm 1. Notethat, in practice, the number of data points m in X need notbe large to compute a universal perturbation that is validfor the whole distribution µ. In particular, we can set mto be much smaller than the number of training points (seeSection 3).

The proposed algorithm involves solving at most m in-stances of the optimization problem in Eq. (1) at each pass.While this optimization problem is not convex when k is astandard classifier (e.g., a deep neural network), several ef-ficient approximate methods have been devised for solvingthis problem [19, 12, 8]. We use in the following the ap-proach in [12] for its efficency. It should further be noticedthat the objective of Algorithm 1 is not to find the smallestuniversal perturbation that fools most data points sampledfrom the distribution, but rather to find one such perturba-tion with sufficiently small norm. In particular, differentrandom shufflings of the set X naturally lead to a diverseset of universal perturbations v satisfying the required con-straints. The proposed algorithm can therefore be leveragedto generate multiple universal perturbations for a deep neu-ral network (see next section for visual examples).

3. Universal perturbations for deep netsWe now analyze the robustness of state-of-the-art deep

neural network classifiers to universal perturbations usingAlgorithm 1.

∆v 1

x1,2,3

R1

R2

v∆v 2

R3

Figure 2: Schematic representation of the proposed algo-rithm used to compute universal perturbations. In this illus-tration, data points x1, x2 and x3 are super-imposed, and theclassification regions Ri are shown in different colors. Ouralgorithm proceeds by aggregating sequentially the minimalperturbations sending the current perturbed points xi + voutside of the corresponding classification region Ri.

Algorithm 1 Computation of universal perturbations.

1: input: Data points X , classifier k, desired `p norm ofthe perturbation ξ, desired accuracy on perturbed sam-ples δ.

2: output: Universal perturbation vector v.3: Initialize v ← 0.4: while Err(Xv) ≤ 1− δ do5: for each datapoint xi ∈ X do6: if k(xi + v) = k(xi) then7: Compute the minimal perturbation that

sends xi + v to the decision boundary:

∆vi ← arg minr‖r‖2

s.t. k(xi + v + r) 6= k(xi).

8: Update the perturbation:

v ← Pp,ξ(v + ∆vi).

9: end if10: end for11: end while

CaffeNet [9] VGG-F [2] VGG-16 [17] VGG-19 [17] GoogLeNet [18] ResNet-152 [7]

`2X 85.4% 85.9% 90.7% 86.9% 82.9% 89.7%Val. 85.6 87.0% 90.3% 84.5% 82.0% 88.5%

`∞X 93.1% 93.8% 78.5% 77.8% 80.8% 85.4%Val. 93.3% 93.7% 78.3% 77.8% 78.9% 84.0%

Table 1: Fooling ratios on the set X , and the validation set.

In a first experiment, we assess the estimated universalperturbations for different recent deep neural networks onthe ILSVRC 2012 [15] validation set (50’000 images), andreport the fooling ratio, that is the proportion of images thatchange labels when perturbed by our universal perturbation.Results are reported for p = 2 and p = ∞, where werespectively set ξ = 2000 and ξ = 10. These numericalvalues were chosen in order to obtain a perturbation whosenorm is significantly smaller than the image norms, suchthat the perturbation is quasi-imperceptible when added tonatural images. Results are listed in Table 1. Each resultis reported on the set X , which is used to compute the per-turbation, as well as on the validation set (that is not usedin the process of the computation of the universal perturba-tion). Observe that for all networks, the universal perturba-tion achieves very high fooling rates on the validation set.Specifically, the universal perturbations computed for Caf-feNet and VGG-F fool more than 90% of the validation set(when p = ∞). In other words, for any natural image inthe validation set, the mere addition of our universal per-turbation fools the classifier more than 9 times out of 10.This result is moreover not specific to such architectures,as we can also find universal perturbations that cause VGG,GoogLeNet and ResNet classifiers to be fooled on naturalimages with probability edging 80%. These results havean element of surprise, as it shows the existence of singleuniversal perturbation vectors that cause natural images tobe misclassified with high probability, albeit being quasi-imperceptible to humans. To verify this latter claim, weshow visual examples of perturbed images in Fig. 3, wherethe GoogLeNet architecture is used. These images are ei-ther taken from the ILSVRC 2012 validation set (rows 1and 2), or taken by a mobile phone camera (row 3). Ob-serve that in most cases, the universal perturbation is quasi-imperceptible, yet this powerful image-agnostic perturba-tion is able to misclassify any image with high probabilityfor state-of-the-art classifiers. We refer to Appendix A forthe original (unperturbed) images, as well as their groundtruth labels. We visualize the universal perturbations corre-sponding to different networks in Fig. 4. It should be notedthat such universal perturbations are not unique, as manydifferent universal perturbations (all satisfying the two re-quired constraints) can be generated for the same network.In Fig. 5, we visualize five different universal perturba-

tions obtained by using different random shufflings in X .Observe that such universal perturbations are different, al-though they exhibit a similar pattern. This is moreoverconfirmed by computing the normalized inner products be-tween two pairs of perturbation images, as the normalizedinner products do not exceed 0.1, which shows that one canfind diverse universal perturbations.

While the above universal perturbations are computedfor a set X of 10′000 images from the training set (i.e., inaverage 10 images per class), we now examine the influenceof the size of X on the quality of the universal perturbation.We show in Fig. 6 the fooling rates obtained on the val-idation set for different sizes of X for GoogLeNet. Notefor example that with a set X containing only 500 images,we can fool more than 30% of the images on the validationset. This result is significant when compared to the num-ber of classes in ImageNet (1′000), as it shows that we canfool a large set of unseen images, even when using a setX containing less than one image per class! The universalperturbations computed using Algorithm 1 have therefore aremarkable generalization power over unseen data points,and can be computed on a very small set of training images.

Cross-model universality. While the computed pertur-bations are universal across unseen data points, we now ex-amine their cross-model universality. That is, we study towhich extent universal perturbations computed for a spe-cific architecture (e.g., VGG-19) are also valid for anotherarchitecture (e.g., GoogLeNet). Table 2 displays a matrixsummarizing the universality of such perturbations acrosssix different architectures. For each architecture, we com-pute a universal perturbation and report the fooling ratios onall other architectures; we report these in the rows of Table2. Observe that, for some architectures, the universal pertur-bations generalize very well across other architectures. Forexample, universal perturbations computed for the VGG-19network have a fooling ratio above 53% for all other testedarchitectures. This result shows that our universal perturba-tions are, to some extent, doubly-universal as they general-ize well across data points and very different architectures.It should be noted that, in [19], adversarial perturbationswere previously shown to generalize well, to some extent,across different neural networks on the MNIST problem.Our results are however different, as we show the general-

wool Indian elephant Indian elephant African grey

tabby African grey common newt carousel

grey fox macaw three-toed sloth macaw

Figure 3: Examples of perturbed images and their corresponding labels. The first two rows of images belong to the ILSVRC2012 validations set, and the last row are random images taken by a mobile phone camera. We refer to the appendix for theoriginal images.

izability of universal perturbations across different architec-tures on the ImageNet data set. This result shows that suchperturbations are of practical relevance, as they generalizewell across data points and architectures. In particular, inorder to fool a new image on an unknown neural network, amere addition of a universal perturbation computed on theVGG-19 architecture is likely to misclassify the data point.

Visualization of the effect of universal perturbations.To gain insights on the effect of universal perturbations onnatural images, we now visualize the distribution of labelson the ImageNet validation set. Specifically, we build a di-rected graph G = (V,E), whose vertices denote the labels,and directed edges e = (i→ j) indicate that the majority ofimages of class i are fooled into label j when applying theuniversal perturbation. The existence of edges i→ j there-fore suggests that the preferred fooling label for images ofclass i is j. We construct this graph for GoogLeNet, and vi-sualize the full graph in Appendix A for space constraints.The visualization of this graph shows a very peculiar topol-

ogy. In particular, the graph is a union of disjoint compo-nents, where all edges in one component mostly connect toone target label. See Fig. 7 for an illustration of two differ-ent connected components. This visualization clearly showsthe existence of several dominant labels, and that universalperturbations mostly make natural images classified withsuch labels. We hypothesize that these dominant labels oc-cupy large regions in the image space, and therefore repre-sent good candidate labels for fooling most natural images.Note that such dominant labels are automatically found bythe algorithm to generate universal perturbations, and arenot imposed a priori in the computation of perturbations.

4. Explaining the vulnerability to universalperturbations

The goal of this section is to analyze and explain the highvulnerability of deep neural network classifiers to universalperturbations. To understand the unique characteristics ofuniversal perturbations, we first compare such perturbations

(a) CaffeNet (b) VGG-F (c) VGG-16

(d) VGG-19 (e) GoogLeNet (f) ResNet-152

Figure 4: Universal perturbations computed for different deep neural network architectures. Images generated with p =∞,ξ = 10. The pixel values are scaled for visibility.

Figure 5: Diversity of universal perturbations for the GoogLeNet architecture. The five perturbations are generated usingdifferent random shufflings of the set X . Note that the normalized inner products by any pair of universal perturbations doesnot exceed 0.1, which highlights the diversity of such perturbations.

with other types of perturbations, namely i) random pertur-bation, ii) sum of adversarial perturbations over X , and iii)mean of the images (or ImageNet bias). For each perturba-tion, we depict a phase transition graph in Fig. 8 showingthe fooling rate on the validation set with respect to the `2norm of the perturbation. Different perturbation norms arethen achieved by scaling accordingly each perturbation witha multiplicative factor to have the target norm. Note that the

universal perturbation is computed for ξ = 2′000, and alsoscaled accordingly.

Observe that the proposed universal perturbation quicklyreaches very high fooling rates, even when the perturbationis constrained to be of small norm. For example, the uni-versal perturbation computed using Algorithm 1 achievesa fooling rate of 85% when the `2 norm is constrained toξ = 2′000, while other perturbations achieve much smaller

Table 2: Generalizability of the universal perturbations across different networks. The percentages indicate the fooling rates.The rows indicate the architecture for which the universal perturbations is computed, and the columns indicate the architecturefor which the fooling rate is reported.

VGG-F CaffeNet GoogLeNet VGG-16 VGG-19 ResNet-152VGG-F 93.7% 71.8% 48.4% 42.1% 42.1% 47.4 %CaffeNet 74.0% 93.3% 47.7% 39.9% 39.9% 48.0%GoogLeNet 46.2% 43.8% 78.9% 39.2% 39.8% 45.5%VGG-16 63.4% 55.8% 56.5% 78.3% 73.1% 63.4%VGG-19 64.0% 57.2% 53.6% 73.5% 77.8% 58.0%ResNet-152 46.3% 46.3% 50.5% 47.0% 45.5% 84.0%

Number of images in X500 1000 2000 4000

Fool

ing

ratio

(%)

0

10

20

30

40

50

60

70

80

90

Figure 6: Fooling ratio on the validation set versus the car-dinality of X . Note that even when the universal perturba-tion is computed on a very small set X (compared to train-ing and validation sets), the fooling ratio on the validationset is large.

ratios for comparable norms. In particular, random vectorssampled uniformly from the sphere of radius of 2′000 onlyfool 10% of the validation set. The large difference be-tween universal and random perturbations suggests that theuniversal perturbation exploits some geometric correlationsbetween different parts of the decision boundary of the clas-sifier. In fact, if the orientations of the decision boundary, inthe neighborhood of different data points, were completelyuncorrelated (and independent of the distance to the deci-sion boundary), the norm of the best universal perturbationwould be comparable to that of a random perturbation. Notethat the latter quantity is well understood (see [5]), as thenorm of the random perturbation required to fool a specificdata point precisely behaves as Θ(

√d‖r‖2), where d is the

dimension of the input space, and ‖r‖2 is the distance be-tween the data point and the decision boundary (or equiv-

alently, the norm of the smallest adversarial perturbation).For the considered ImageNet classification task, this quan-tity is equal to

√d‖r‖2 ≈ 2 × 104, for most data points,

which is at least one order of magnitude larger than the uni-versal perturbation (ξ = 2′000). This substantial differencebetween random and universal perturbations thereby sug-gests redundancies in the geometry of the decision bound-aries that we now explore.

For each image x in the validation set, we com-pute the adversarial perturbation vector r(x) =

arg minr ‖r‖2 s.t. k(x + r) 6= k(x). It is easy to seethat r(x) is normal to the decision boundary of the clas-sifier (at x + r(x)). The vector r(x) hence captures thelocal geometry of the decision boundary in the regionsurrounding the data point x. To quantify the correlationbetween different regions of the decision boundary of theclassifier, we define the matrix

N =

[r(x1)

‖r(x1)‖2. . .

r(xn)

‖r(xn)‖2

]of normal vectors to the decision boundaries in the vicin-ity of data points in the validation set. For binary linearclassifiers, the decision boundary is a hyperplane, and N isof rank 1, as all normal vectors are collinear. To capturemore generally the correlations in the decision boundary ofcomplex classifiers, we compute the singular values of thematrix N . The singular values of the matrix N , computedfor the CaffeNet architecture are shown in Fig. 9. We fur-ther show in the same figure the singular values obtainedwhen the columns of N are sampled uniformly at randomfrom the unit sphere. Observe that, while the latter singu-lar values have a slow decay, the singular values of N de-cay quickly, which confirms the existence of large corre-lations and redundancies in the decision boundary of deepnetworks. More precisely, this shows the existence of a sub-space S of low dimension d′ (with d′ � d), that containsmost normal vectors to the decision boundary in regionssurrounding natural images. We hypothesize that the exis-tence of universal perturbations fooling most natural imagesis partly due to the existence of such a low-dimensional sub-space that captures the correlations among different regions

great grey owl

platypus

nematode

dowitcher

Arctic fox

leopard

digital clock

fountain

slide rule

space shuttle

cash machine

pillow

computer keyboard

dining table

envelopemedicine chest

microwave

mosquito net

pencil box

plate rack

quilt

refrigerator

television

tray

wardrobe

window shade

Figure 7: Two connected components of the graph G = (V,E), where the vertices are the set of labels, and directed edgesi→ j indicate that most images of class i are fooled into class j.

0 2000 4000 6000 8000 100000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

UniversalRandomSumImageNet bias

Figure 8: Comparison between fooling rates of differentperturbations.

of the decision boundary. In fact, this subspace “collects”normals to the decision boundary in different regions, andperturbations belonging to this subspace are therefore likelyto fool datapoints. To verify this hypothesis, we choose arandom vector of norm ξ = 2′000 belonging to the sub-space S spanned by the first 100 singular vectors, and com-pute its fooling ratio on a different set of images (i.e., a setof images that have not been used to compute the SVD).Such a perturbation can fool nearly 38% of these images,thereby showing that a random direction in this well-soughtsubspace S significantly outperform random perturbations(we recall that such perturbations can only fool 10% of thedata). Fig. 10 illustrates the subspace S that captures thecorrelations in the decision boundary. It should further be

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5Index 104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Sing

ular

val

ues

RandomAdversarial perturbations

Figure 9: Singular values of matrix N containing normalvectors to the decision decision boundary.

noted that the existence of this low dimensional subspacefurther explains the surprising generalization properties ofuniversal perturbations obtained in Fig. 6, where one canbuild relatively generalizable universal perturbations withonly 500 images (less than one image per class).

Unlike the above experiment, the proposed algorithmdoes not choose a random vector in this subspace, but ratherchooses a specific direction in order to maximize the over-all fooling rate. This explains the gap between the foolingrates obtained with the random vector strategy in S and Al-gorithm 1, respectively.

Figure 10: Illustration of the low dimensional subspaceS containing normal vectors to the decision boundary inregions surrounding natural images. For the purpose ofthis illustration, we super-impose three data-points {xi}3i=1,and the adversarial perturbations {ri}3i=1 that send the re-spective datapoints to the decision boundary {Bi}3i=1 areshown. Note that {ri}3i=1 all live in the subspace S.

5. ConclusionsWe showed the existence of small universal perturba-

tions that can fool state-of-the-art classifiers on natural im-ages. We proposed an iterative algorithm to generate uni-versal perturbations, and highlighted several properties ofsuch perturbations. In particular, we showed that universalperturbations generalize well across different classificationmodels, resulting in doubly-universal perturbations (image-agnostic, network-agnostic). We further explained the ex-istence of such perturbations with the correlation betweendifferent regions of the decision boundary. This providesinsights on the geometry of the decision boundaries of deepneural networks, and contributes to a better understandingof such systems. A theoretical analysis of the geometriccorrelations between different parts of the decision bound-ary will be the subject of future research.

References[1] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis,

A. Nori, and A. Criminisi. Measuring neural net robustnesswith constraints. In Neural Information Processing Systems(NIPS), 2016. 2

[2] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman.Return of the devil in the details: Delving deep into convo-lutional nets. In British Machine Vision Conference, 2014.4

[3] A. Fawzi, O. Fawzi, and P. Frossard. Analysis of clas-sifiers’ robustness to adversarial perturbations. CoRR,abs/1502.02590, 2015. 2

[4] A. Fawzi and P. Frossard. Manitest: Are classifiers reallyinvariant? In British Machine Vision Conference (BMVC),pages 106.1–106.13, 2015. 2

[5] A. Fawzi, S. Moosavi-Dezfooli, and P. Frossard. Robustnessof classifiers: from adversarial to random noise. In NeuralInformation Processing Systems (NIPS), 2016. 2, 7

[6] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining andharnessing adversarial examples. In International Confer-ence on Learning Representations (ICLR), 2015. 2

[7] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learningfor image recognition. In IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 2016. 2, 4

[8] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvari. Learn-ing with a strong adversary. CoRR, abs/1511.03034, 2015.3

[9] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir-shick, S. Guadarrama, and T. Darrell. Caffe: Convolu-tional architecture for fast feature embedding. In ACM Inter-national Conference on Multimedia (MM), pages 675–678,2014. 4

[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenetclassification with deep convolutional neural networks. InAdvances in neural information processing systems (NIPS),pages 1097–1105, 2012. 2

[11] Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng. Learn-ing hierarchical invariant spatio-temporal features for actionrecognition with independent subspace analysis. In Com-puter Vision and Pattern Recognition (CVPR), 2011 IEEEConference on, pages 3361–3368. IEEE, 2011. 2

[12] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deep-fool: a simple and accurate method to fool deep neural net-works. In IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2016. 2, 3

[13] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networksare easily fooled: High confidence predictions for unrecog-nizable images. In IEEE Conference on Computer Visionand Pattern Recognition (CVPR), pages 427–436, 2015. 2

[14] E. Rodner, M. Simon, R. Fisher, and J. Denzler. Fine-grainedrecognition in the noisy wild: Sensitivity analysis of con-volutional neural networks approaches. In British MachineVision Conference (BMVC), 2016. 2

[15] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,A. Berg, and L. Fei-Fei. Imagenet large scale visual recog-nition challenge. International Journal of Computer Vision,115(3):211–252, 2015. 4

[16] S. Sabour, Y. Cao, F. Faghri, and D. J. Fleet. Adversarialmanipulation of deep representations. In International Con-ference on Learning Representations (ICLR), 2016. 2

[17] K. Simonyan and A. Zisserman. Very deep convolutionalnetworks for large-scale image recognition. In InternationalConference on Learning Representations (ICLR), 2014. 4

[18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.Going deeper with convolutions. In IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2015. 4

[19] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan,I. Goodfellow, and R. Fergus. Intriguing properties of neuralnetworks. In International Conference on Learning Repre-sentations (ICLR), 2014. 2, 3, 4

[20] P. Tabacof and E. Valle. Exploring the space of adversar-ial images. IEEE International Joint Conference on NeuralNetworks, 2016. 2

[21] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface:Closing the gap to human-level performance in face verifica-tion. In IEEE Conference on Computer Vision and PatternRecognition (CVPR), pages 1701–1708, 2014. 2

A. AppendixFig. 11 shows the original images corresponding to the experiment in Fig. 3. Fig. 12 visualizes the graph showing

relations between original and perturbed labels (see Section 3 for more details).

Bouvier des Flandres Christmas stocking Scottish deerhound ski mask

porcupine killer whale European fire salamander toyshop

fox squirrel pot Arabian camel coffeepot

Figure 11: Original images. The first two rows are randomly chosen images from the validation set, and the last row ofimages are personal images taken from a mobile phone camera.

great grey owl

platypus

nematode

dowitcher

Arctic fox

leopard

digital clock

fountain

slide rule

space shuttle

junco

common iguana

axolotl

tree frog

banded gecko

American chameleon

green lizard

African chameleon

African crocodile

water snake

green mamba

cougar

rhinoceros beetle

weevil

grasshopper

cricket

walking stick

mantis

lacewing

Band Aid

cliff dwelling

hammer

photocopier

tank

tub

dough

burrito

green snake

tick

long-horned beetle

night snake

house �inch

African grey

chickadee

water ouzel

kite

black and gold garden spider

garden spider

sulphur-crested cockatoo

conch

hermit crab

white stork

ruddy turnstone

red-backed sandpiper

oystercatcher

albatross

grey whale

killer whale

Maltese dog

Pekinese

Walker hound

Lhasa

schipperke

Eskimo dog Great Pyrenees

Samoyed

Pembroke

toy poodle

Persian cat

ice bear

hartebeest

llama

sturgeon

abaya

acoustic guitar

aircraft carrier

airliner

airship

amphibian

assault ri�le

backpack

balance beam

balloon

ballpoint

bannister

barbell

barbershop

barn

bassoon

bathtub

beacon

bell cote

bikini

binoculars

boathouse

bobsled

bow tie

brassiere

breakwater

breastplate

bucketcaldron

cannon

can opener

car mirror

carton

cassette player

castle

catamaran

CD player

chime

church

cleaver

cocktail shaker

container ship

convertible

corkscrew

cornet

cowboy hat

cradle

crane

dam

desk

desktop computer

diaper

dock

dogsled

drilling platform

drum

drumstick

dumbbell

espresso maker

�ireboat

�lagpole

�lute

folding chair

fountain pen

frying pan

garbage truck

gasmask

golf ball

gown

grand piano

guillotine

hair spray

hand blower

harmonica

harvester

home theater

hook

iPod

iron

joystick

knee pad

lab coat

ladle

letter opener

lifeboat

liner

Loafer

loudspeaker

loupe

maillot

measuring cup

megalith

microphone

military uniform

missile

mixing bowl

mobile home

modem

monastery

mortarboard

mosque

mountain tent

mouse

mousetrap

moving van

muzzle

nail

nipple

oboe

oxygen mask paintbrush

palace

paper towel

passenger car

patio

pedestal

pencil sharpener

Petri dish

pier

ping-pong ball

pirate

plane

planetarium

plastic bag

plunger

Polaroid camera

pole

pool table

power drill

printer

projectile

projector

punching bag

purse

quill

racket

radiator

radio telescope

re�lex camera

rubber eraser

saltshaker

sandal

scale

screwdriver

seat belt

shoji

ski

sliding door

snowmobile

snowplow

soap dispenser

solar dish

space heater

spatula

speedboat

spotlight

steam locomotive

steel arch bridge

stethoscope

strainer

studio couch

stupa

submarine

suit

sunglass

sunscreen

suspension bridge

swimming trunks

switch

syringe

table lampteapot

thatch

toaster

trailer truck

trench coat

trimaran

tripod

turnstile

vacuum

vase

viaduct

violin

volleyball

warplane

washbasin

washer

water tower

whiskey jugwhistle

wig

wing

wok

worm fence

wreck

yawl

traf�ic light

plate

consomme

ice cream

hay

chocolate sauce

alp

cliff

geyser

lakeside

promontory

sandbar

seashore

valleyvolcano

groom

toilet tissue

great white shark

macaw

tiger shark

hammerheadelectric ray

stingray

gold�inch

indigo bunting

bulbul

jay

magpie

bald eagle

vine snake

barn spider

bee eater

hornbill

hummingbird

jacamar

jelly�ish

chambered nautilus

American egret

dugong

Siberian husky

ladybug

leaf beetle

�ly

ant

lea�hopper

dragon�ly

damsel�ly

admiral

cabbage butter�ly

sulphur butter�ly

lycaenid

Angora

beer glass

candle

cinema

cloak

electric guitar

jack-o'-lantern

lampshade

lens cap

lighter

matchstick

obelisk

parachute

parallel bars

pinwheel

ri�le

schooner

soup bowl

stage

tennis ball

torch

upright

water jug

bell pepper

lemon

red wine

scuba diver

ringneck snake

banana

American lobster

cray�ish

cockroach

Kerry blue terrier

giant schnauzer

miniature poodle

tabbytiger cat

Egyptian cat

lynx

black widow

Indian elephant

sloth bear

African elephant

cello

crutch

fur coat

maillot

miniskirt

neck brace

pickelhaube

potter's wheel

prison

sax

shovel

spindle

Crock Pot

Dutch oven

bulletproof vest

cardiganjersey

Windsor tie

chest

chiffonier

safe

stove

crash helmet

mask

shower cap

ski mask

sunglasses

cash machine

pillow

computer keyboard

dining table

envelopemedicine chest

microwave

mosquito net

pencil box

plate rack

quilt

refrigerator

television

tray

wardrobe

window shade

mailbag

beaker

thimble

china cabinet

coffee mug

coffeepot

lipstick

oil �ilter

perfume

wine bottle

espresso

cup

eggnog

binder

wallet

monitor

notebook

oscilloscope

screen

tape player

barrel

wool

bath towel

broom

crate

face powder

knot

lotion

maraca

mitten

mortar

pick

pill bottle

rule

sweatshirt

velvet

wooden spoon

ice lolly

cheeseburger

hotdog

spaghetti squash

butternut squash

Granny Smith

strawberry

Figure 12: Graph representing the relation between original and perturbed labels. Note that “dominant labels” appearsystematically. Please zoom for readability. Isolated nodes are removed from this visualization for readability.