73
PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking Chong Xiang , Arjun Nitin Bhagoji , Vikash Sehwag , Prateek Mittal Princeton University University of Chicago USENIX Security Symposium 2021

PatchGuard: A Provably Robust Defense against Adversarial

  • Upload
    others

  • View
    11

  • Download
    1

Embed Size (px)

Citation preview

Page 1: PatchGuard: A Provably Robust Defense against Adversarial

PatchGuard: A Provably Robust Defense against Adversarial

Patches via Small Receptive Fields and Masking

Chong Xiang†, Arjun Nitin Bhagoji‡, Vikash Sehwag†, Prateek Mittal†

†Princeton University ‡University of Chicago

USENIX Security Symposium 2021

Page 2: PatchGuard: A Provably Robust Defense against Adversarial

PatchGuard: A Provably Robust Defense against Adversarial

Patches via Small Receptive Fields and Masking

Chong Xiang†, Arjun Nitin Bhagoji‡, Vikash Sehwag†, Prateek Mittal†

†Princeton University ‡University of Chicago

USENIX Security Symposium 2021

Page 3: PatchGuard: A Provably Robust Defense against Adversarial

PatchGuard: A Provably Robust Defense against Adversarial

Patches via Small Receptive Fields and Masking

Chong Xiang†, Arjun Nitin Bhagoji‡, Vikash Sehwag†, Prateek Mittal†

†Princeton University ‡University of Chicago

USENIX Security Symposium 2021

Page 4: PatchGuard: A Provably Robust Defense against Adversarial

Adversarial Example Attacks: Small Perturbations for Test-Time Model Misclassification

Normal Example (𝒙)Dog (𝑦)

Adversarial Example (𝒙 + 𝜹)Cat (𝑦′)

Add imperceptible perturbations 𝛿

max𝛿

𝐿 𝑀 𝑥 + 𝛿 , 𝑦

𝐿 ∙ - Loss function; 𝑀 ∙ - Model

A threat to ML models!Challenge: Requires global perturbations

3 of 71

Page 5: PatchGuard: A Provably Robust Defense against Adversarial

Adversarial Patch [1]

LaVAN Attack [2]

1. All perturbations within one local region (patch)2. Patch pixels can take arbitrary values3. Realizable in the physical world – print and attach the patch!

• A REAL-WORLD threat

4. Patch can be anywhere on the image5. Patch size should be reasonable (shouldn’t block the entire salient object)

Our Focus: Localized Adversarial Patch Attacks

[1] Brown et al., “Adversarial Patch,” NeurIPS Workshops 2017[2] Karmon et al., “LaVAN: Localized and Visible Adversarial Noise,” ICML 2018.

4 of 71

Page 6: PatchGuard: A Provably Robust Defense against Adversarial

Defense Objective: Provable Robustness on Certified Test Images

Test Image

Threat Model (patch sizes, shapes, and location set) Provable Analysis

Ground-truth 𝒚

“I think it is a dog” “My prediction will never change”

or

“I can’t say anything for sure”

The prediction is always correct;Image certified!

Provable robust accuracy / certified accuracy: the fraction of test images that are1. Correctly classified2. Provably robust to any (adaptive) localized patch attack within the threat model

5 of 71

Page 7: PatchGuard: A Provably Robust Defense against Adversarial

PatchGuard aims to prevent the localized patch from dominating the global prediction

Our Contribution: PatchGuard Defense Framework with Provable Robustness

Bound the number of corrupted features

Small Receptive Field

PatchGuard: A Provably Robust Defense Framework

6 of 71

Do robust prediction on partially corrupted features

Secure Feature Aggregation

Page 8: PatchGuard: A Provably Robust Defense against Adversarial

Our Contribution: PatchGuard Defense Framework with Provable Robustness

Bound the number of corrupted features

Small Receptive Field

Do robust prediction on partially corrupted features

PatchGuard: A Provably Robust Defense Framework

7 of 71

PatchGuard aims to prevent the localized patch from dominating the global prediction

Secure Feature Aggregation

Page 9: PatchGuard: A Provably Robust Defense against Adversarial

Receptive Field: a Region of the Input Image that an Extracted Feature is Looking at

8 of 71

Page 10: PatchGuard: A Provably Robust Defense against Adversarial

Local feature map

10869

Receptive Field: a Region of the Input Image that an Extracted Feature is Looking at

9 of 71

Page 11: PatchGuard: A Provably Robust Defense against Adversarial

Local feature map

10869

10989

Receptive Field: a Region of the Input Image that an Extracted Feature is Looking at

10 of 71

Page 12: PatchGuard: A Provably Robust Defense against Adversarial

Local feature map

10869

10989

11864

Receptive Field: a Region of the Input Image that an Extracted Feature is Looking at

11 of 71

Page 13: PatchGuard: A Provably Robust Defense against Adversarial

Local feature map

10869

10989

11864

00999

Receptive Field: a Region of the Input Image that an Extracted Feature is Looking at

12 of 71

Page 14: PatchGuard: A Provably Robust Defense against Adversarial

Local feature map

10869

10989

11864

00999

Global prediction / logits

31342931

434 Dog!

cat

Global feature

Aggregate Local Features for Global Prediction

13 of 71

Page 15: PatchGuard: A Provably Robust Defense against Adversarial

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

Example 1: CNN with large receptive fields (e.g., ResNet with 483 × 483 px)

Local feature map

98101

14 of 71

Page 16: PatchGuard: A Provably Robust Defense against Adversarial

Local feature map

98101

87211

Example 1: CNN with large receptive fields (e.g., ResNet with 483 × 483 px)

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

15 of 71

Page 17: PatchGuard: A Provably Robust Defense against Adversarial

Local feature map

98101

87211

86100

Example 1: CNN with large receptive fields (e.g., ResNet with 483 × 483 px)

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

16 of 71

Page 18: PatchGuard: A Provably Robust Defense against Adversarial

Local feature map

98101

87211

86100

57021

Note: all feature corrupted!Little hope for us to do a robust prediction

Example 1: CNN with large receptive fields (e.g., ResNet with 483 × 483 px)

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

17 of 71

Page 19: PatchGuard: A Provably Robust Defense against Adversarial

Example 2: CNN with small receptive fields (e.g., BagNet with 17 × 17 px)

20181010

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

18 of 71

Page 20: PatchGuard: A Provably Robust Defense against Adversarial

Example 2: CNN with small receptive fields (e.g., BagNet with 17 × 17 px)

01032

20181010

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

19 of 71

Page 21: PatchGuard: A Provably Robust Defense against Adversarial

Example 2: CNN with small receptive fields (e.g., BagNet with 17 × 17 px)

01032

01122

20181010

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

20 of 71

Page 22: PatchGuard: A Provably Robust Defense against Adversarial

Example 2: CNN with small receptive fields (e.g., BagNet with 17 × 17 px)

01032

01122

20181010

01000

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

21 of 71

Page 23: PatchGuard: A Provably Robust Defense against Adversarial

Example 2: CNN with small receptive fields (e.g., BagNet with 17 × 17 px)

01032

01122

20181010

11423

01000

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

22 of 71

Page 24: PatchGuard: A Provably Robust Defense against Adversarial

Example 2: CNN with small receptive fields (e.g., BagNet with 17 × 17 px)

01032

01122

20181010

11423

01323

01000

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

23 of 71

Page 25: PatchGuard: A Provably Robust Defense against Adversarial

Example 2: CNN with small receptive fields (e.g., BagNet with 17 × 17 px)

01032

01122

20181010

11423

01323

01000

11423

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

24 of 71

Page 26: PatchGuard: A Provably Robust Defense against Adversarial

Example 2: CNN with small receptive fields (e.g., BagNet with 17 × 17 px)

01032

01122

20181010

11423

01323

01000

11123

11423

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

25 of 71

Page 27: PatchGuard: A Provably Robust Defense against Adversarial

Example 2: CNN with small receptive fields (e.g., BagNet with 17 × 17 px)

01032

01122

20181010

11423

01323

01000

11123

00122

11423

Note: only one feature corrupted! A major step towards robust prediction!

Key insight: the Receptive Field Size Determines the Number of Features Corrupted by the Adversarial Patch

26 of 71

Page 28: PatchGuard: A Provably Robust Defense against Adversarial

Key insight: the Small Receptive Field Size Bounds the Number of Features Corrupted by the Adversarial Patch

Number of corrupted features 𝑘 (along one axis) satisfies:

𝑘 =𝑝 + 𝑟 − 1

𝑠𝑝 patch size; 𝑟 receptive field size; 𝑠 receptive field stride(more details are in the paper)

A smaller receptive field gives fewer corrupted features!

27 of 71

Page 29: PatchGuard: A Provably Robust Defense against Adversarial

Bound the number of corrupted features

Small Receptive Field

Do robust prediction on partially corrupted features

PatchGuard: A Provably Robust Defense Framework

28 of 71

Our Contribution: PatchGuard Defense Framework with Provable Robustness

PatchGuard aims to prevent the localized patch from dominating the global prediction

Secure Feature Aggregation

Page 30: PatchGuard: A Provably Robust Defense against Adversarial

Bound the number of corrupted features

Small Receptive Field

Do robust prediction on partially corrupted features

Secure Feature Aggregation

PatchGuard: A Provably Robust Defense Framework

29 of 71

Our Contribution: PatchGuard Defense Framework with Provable Robustness

PatchGuard aims to prevent the localized patch from dominating the global prediction

Page 31: PatchGuard: A Provably Robust Defense against Adversarial

20181010

Vulnerability of Insecure Feature Aggregation

01032

01122

11423

01323

01000

11123

00122

11423

Global feature

3527

Global prediction / logits

Cat!

Extremely large malicious values dominate the insecure feature aggregation and global prediction

Secure feature aggregation to limit the adversarial effect!• Robust masking to detect and remove large values

2325241618

30 of 71

Local feature map

Page 32: PatchGuard: A Provably Robust Defense against Adversarial

Leveraging Local Logits for Robust Masking

01032

01122

11423

01323

01000

11123

00122

11423

300

cat

dog

20181010

31 of 71

Local logits: making local prediction based on the local feature

Local prediction / logits mapLocal feature map

Page 33: PatchGuard: A Provably Robust Defense against Adversarial

01032

01122

20181010

11423

01323

01000

11123

00122

11423

02

300

cat

dog

32 of 71

Local prediction / logits mapLocal feature map

Leveraging Local Logits for Robust Masking

Local logits: making local prediction based on the local feature

Page 34: PatchGuard: A Provably Robust Defense against Adversarial

01032

01122

20181010

11423

01323

01000

11123

00122

11423

02

02

300

cat

dog

33 of 71

Local prediction / logits mapLocal feature map

Leveraging Local Logits for Robust Masking

Local logits: making local prediction based on the local feature

Page 35: PatchGuard: A Provably Robust Defense against Adversarial

01032

01122

20181010

11423

01323

01000

11123

00122

11423

02

021

0

300

cat

dog

34 of 71

Local prediction / logits mapLocal feature map

Leveraging Local Logits for Robust Masking

Local logits: making local prediction based on the local feature

Page 36: PatchGuard: A Provably Robust Defense against Adversarial

01032

01122

20181010

11423

01323

01000

11123

00122

11423

02

021

007

300 cat

dog

35 of 71

Local prediction / logits mapLocal feature map

Leveraging Local Logits for Robust Masking

Local logits: making local prediction based on the local feature

Page 37: PatchGuard: A Provably Robust Defense against Adversarial

01032

01122

20181010

11423

01323

01000

11123

00122

11423

02

021

007

16

300

cat

dog

36 of 71

Local prediction / logits mapLocal feature map

Leveraging Local Logits for Robust Masking

Local logits: making local prediction based on the local feature

Page 38: PatchGuard: A Provably Robust Defense against Adversarial

01032

01122

20181010

11423

01323

01000

11123

00122

11423

02

021

007

162

1

300

cat

dog

37 of 71

Local prediction / logits mapLocal feature map

Leveraging Local Logits for Robust Masking

Local logits: making local prediction based on the local feature

Page 39: PatchGuard: A Provably Robust Defense against Adversarial

01032

01122

20181010

11423

01323

01000

11123

00122

11423

02

021

007

162

105

300

cat

dog

38 of 71

Local prediction / logits mapLocal feature map

Leveraging Local Logits for Robust Masking

Local logits: making local prediction based on the local feature

Page 40: PatchGuard: A Provably Robust Defense against Adversarial

01032

01122

20181010

11423

01323

01000

11123

00122

11423

Local prediction / logits map

02

021

007

162

105

14

300

cat

dog

Local feature map

39 of 71

Leveraging Local Logits for Robust Masking

Local logits: making local prediction based on the local feature

Page 41: PatchGuard: A Provably Robust Defense against Adversarial

02

021

007

162

105

14

300

cat

dog

3527

Global prediction / logits

Cat!

40 of 71

01032

01122

20181010

11423

01323

01000

11123

00122

11423

Aggregating local logits gives the same global logits prediction

Local prediction / logits mapLocal feature map

Leveraging Local Logits for Robust Masking

Page 42: PatchGuard: A Provably Robust Defense against Adversarial

01032

01122

20181010

11423

01323

01000

11123

00122

11423

02

021

007

162

105

14

300

cat

dog

3527

Global prediction / logits

Cat!

41 of 71

Local prediction / logits mapLocal feature map

Aggregating local logits gives the same global logits prediction

Leveraging Local Logits for Robust Masking

Page 43: PatchGuard: A Provably Robust Defense against Adversarial

A Better Visualization: Local Logits Map Slice

30 0 01 0 12 0 1

0 2 20 7 61 5 4

local logits map slice for catCat: 𝟑𝟓

local logits map slice for dogDog: 𝟐7

300

02

021

007

162

105

14

• One local logits map slice for one class

• Class evidence: elements of each slice

42 of 71

Page 44: PatchGuard: A Provably Robust Defense against Adversarial

Robust Masking: Algorithm

30 0 01 0 12 0 1

0 2 20 7 61 5 4

local logits map slice for catCat: 𝟑𝟓

local logits map slice for dogDog: 𝟐7

Robust Masking:1. Clip all negative values to zeros2. Move a sliding window over each local

logits slice (1 × 1 window here)3. Calculate class evidence sum within each

window4. Mask the window with the highest sum

43 of 71

Page 45: PatchGuard: A Provably Robust Defense against Adversarial

Robust Masking: Prediction in the Adversarial Setting

30 0 01 0 12 0 1

0 2 20 7 61 5 4

Robust Masking:1. Clip all negative values to zeros2. Move a sliding window over each local

logits slice (1 × 1 window here)3. Calculate class evidence sum within each

window4. Mask the window with the highest sum

local logits map slice for catCat: 𝟑𝟓

local logits map slice for dogDog: 𝟐7

44 of 71

Page 46: PatchGuard: A Provably Robust Defense against Adversarial

Robust Masking: Prediction in the Adversarial Setting

30 0 01 0 12 0 1

0 2 20 7 61 5 4

Robust Masking:1. Clip all negative values to zeros2. Move a sliding window over each local

logits slice (1 × 1 window here)3. Calculate class evidence sum within each

window4. Mask the window with the highest sum

The prediction in the adversarial setting is subject to partial feature masking

local logits map slice for catCat: 𝟓

local logits map slice for dogDog: 𝟐𝟎

45 of 71

Page 47: PatchGuard: A Provably Robust Defense against Adversarial

Robust Masking: Prediction in the Clean Setting

0 0 01 0 12 0 1

1 2 20 7 61 5 4

Robust Masking:1. Clip all negative values to zeros2. Move a sliding window over each local

logits slice (1 × 1 window here)3. Calculate class evidence sum within each

window4. Mask the window with the highest sum

local logits map slice for catCat: 𝟓

local logits map slice for dogDog: 𝟐8

46 of 71

Page 48: PatchGuard: A Provably Robust Defense against Adversarial

Robust Masking: Prediction in the Clean Setting

0 0 01 0 12 0 1

1 2 20 7 61 5 4

Robust Masking:1. Clip all negative values to zeros2. Move a sliding window over each local

logits slice (1 × 1 window here)3. Calculate class evidence sum within each

window4. Mask the window with the highest sum

The prediction in the clean setting is generally invariant to partial feature masking

local logits map slice for catCat: 𝟑

local logits map slice for dogDog: 𝟐𝟏

47 of 71

Page 49: PatchGuard: A Provably Robust Defense against Adversarial

Bound the number of corrupted features

Small Receptive Field

Do robust prediction on partially corrupted features

PatchGuard: A Provably Robust Defense Framework

48 of 71

Our Contribution: PatchGuard Defense Framework with Provable Robustness

PatchGuard aims to prevent the localized patch from dominating the global prediction

Secure Feature Aggregation

Page 50: PatchGuard: A Provably Robust Defense against Adversarial

Bound the number of corrupted features

Small Receptive Field

Do robust prediction on partially corrupted features

PatchGuard: A Provably Robust Defense Framework

49 of 71

Our Contribution: PatchGuard Defense Framework with Provable Robustness

PatchGuard aims to prevent the localized patch from dominating the global prediction

Secure Feature Aggregation

Page 51: PatchGuard: A Provably Robust Defense against Adversarial

Recall: Provable Robustness on Certified Test Images

Test Image

Threat Model (patch sizes, shapes, and location set) Provable Analysis

Ground-truth 𝒚

“I think it is a dog” “My prediction will never change”

or

“I can’t say anything for sure”

The prediction is always correct

Provable robust accuracy / certified accuracy: the fraction of test images that are1. Correctly classified2. Provably robust to any (adaptive) localized patch attack within the threat model

50 of 71

Page 52: PatchGuard: A Provably Robust Defense against Adversarial

Recall: Provable Robustness on Certified Test Images

Test Image

Threat Model (patch sizes, shapes, and location set) Provable Analysis

Ground-truth 𝒚

“I think it is a dog” “My prediction will never change”

or

“I can’t say anything for sure”

The prediction is always correct

Provable robust accuracy / certified accuracy: the fraction of test images that are1. Correctly classified2. Provably robust to any (adaptive) localized patch attack within the threat model

51 of 71

Page 53: PatchGuard: A Provably Robust Defense against Adversarial

Recall: Provable Robustness on Certified Test Images

Test Image

Threat Model (patch sizes, shapes, and location set) Provable Analysis

Ground-truth 𝒚

“I think it is a dog” “My prediction will never change”

or

“I can’t say anything for sure”

The prediction is always correct

Provable robust accuracy / certified accuracy: the fraction of test images that are1. Correctly classified2. Provably robust to any (adaptive) localized patch attack within the threat model

52 of 71

Page 54: PatchGuard: A Provably Robust Defense against Adversarial

Provable Analysis

? 0 01 0 12 0 1

? 2 20 7 61 5 4

local logits map slice for catCat: ?

The adversary can control values within a small window (1 × 1window here)

local logits map slice for dogDog: ?

53 of 71

Page 55: PatchGuard: A Provably Robust Defense against Adversarial

Provable Analysis: Upper Bound of Class Evidence

? 0 01 0 12 0 1

? 2 20 7 61 5 4

The adversary can control values within a small window (1 × 1window here)1. The adversary cannot increase the malicious class

evidence too much

local logits map slice for catCat: ?

local logits map slice for dogDog: ?

54 of 71

Page 56: PatchGuard: A Provably Robust Defense against Adversarial

? 2 20 7 61 5 4

3 0 01 0 12 0 1

The adversary can control values within a small window (1 × 1window here)1. The adversary cannot increase the malicious class

evidence too much• A large value will be masked

Provable Analysis: Upper Bound of Class Evidence

local logits map slice for catCat: ?

local logits map slice for dogDog: ?

55 of 71

Page 57: PatchGuard: A Provably Robust Defense against Adversarial

? 2 20 7 61 5 4

3 0 01 0 12 0 1

The adversary can control values within a small window (1 × 1window here)1. The adversary cannot increase the malicious class

evidence too much• A large value will be masked• The robust masking imposes an upper bound of the

class evidence sum

Provable Analysis: Upper Bound of Class Evidence

local logits map slice for catCat: 𝟓

local logits map slice for dogDog: ?

56 of 71

Page 58: PatchGuard: A Provably Robust Defense against Adversarial

? 0 01 0 12 0 1

? 2 20 7 61 5 4

The adversary can control values within a small window (1 × 1window here)1. The adversary cannot increase the malicious class

evidence too much• A large value will be masked• The robust masking imposes an upper bound of the

class evidence sum2. The adversary cannot decrease the benign class evidence

too much

Provable Analysis: Lower Bound of Class Evidence

local logits map slice for catCat: 𝟓

local logits map slice for dogDog: ?

57 of 71

Page 59: PatchGuard: A Provably Robust Defense against Adversarial

? 0 01 0 12 0 1

0 2 20 7 61 5 4

The adversary can control values within a small window (1 × 1window here)1. The adversary cannot increase the malicious class

evidence too much• A large value will be masked• The robust masking imposes an upper bound of the

class evidence sum2. The adversary cannot decrease the benign class evidence

too much• Can only push malicious values to zero

Provable Analysis: Lower Bound of Class Evidence

local logits map slice for catCat: 𝟓

local logits map slice for dogDog: ?

58 of 71

Page 60: PatchGuard: A Provably Robust Defense against Adversarial

? 0 01 0 12 0 1

0 2 20 7 61 5 4

The adversary can control values within a small window (1 × 1window here)1. The adversary cannot increase the malicious class

evidence too much• A large value will be masked• The robust masking imposes an upper bound of the

class evidence sum2. The adversary cannot decrease the benign class evidence

too much• Can only push malicious values to zero• Clipping all negative values imposes a lower bound of

the class evidence sum

Provable Analysis: Lower Bound of Class Evidence

local logits map slice for catCat: 𝟓

local logits map slice for dogDog: 20

59 of 71

Page 61: PatchGuard: A Provably Robust Defense against Adversarial

Provable Analysis: Bounds hold for Any Attack Strategy

? 0 01 0 12 0 1

The adversary can control values within a small window (1 × 1window here)1. The adversary cannot increase the malicious class

evidence too much• A large value will be masked• The robust masking imposes an upper bound of the

class evidence sum2. The adversary cannot decrease the benign class evidence

too much• Can only push malicious values to zero• Clipping all negative values imposes a lower bound of

the class evidence sum

0 2 20 7 61 5 4 We can derive bounds that apply to any attack

strategy! (formal proof in the paper)

local logits map slice for catCat: 𝟓

local logits map slice for dogDog: 20

60 of 71

Page 62: PatchGuard: A Provably Robust Defense against Adversarial

Provable Analysis: Example

? 0 01 0 12 0 1

? 2 20 7 61 5 4

Lower Bound Upper Bound

Cat 3 5

Dog 20 27

local logits map slice for catCat: ?

local logits map slice for dogDog: ?

61 of 71

Page 63: PatchGuard: A Provably Robust Defense against Adversarial

? 0 01 0 12 0 1

? 2 20 7 61 5 4

Lower Bound Upper Bound

Cat 3 5

Dog 20 27

• 20 (lower bound of dog) > 5 (upper bound of cat) • Provably Robust (always predicts dog)!

• Try all possible patch locations• This image is certified :)

Provable Analysis: Example

Test Image

Threat Model (patch sizes, shapes, and

location set)

local logits map slice for catCat: ?

local logits map slice for dogDog: ?

62 of 71

Page 64: PatchGuard: A Provably Robust Defense against Adversarial

Evaluation: Substantial Provable Robustness

1. PatchGuard achieves substantial provable robustness (robustness evaluated against a 2%-pixel square patch anywhere on the image)

10-class ImageNette

Accuracy Clean Robust

PatchGuard 95.0% 86.7%

63 of 71

Page 65: PatchGuard: A Provably Robust Defense against Adversarial

Evaluation: Substantial Provable Robustness

1. PatchGuard achieves substantial provable robustness (robustness evaluated against a 2%-pixel square patch anywhere on the image)

10-class ImageNette 1000-class ImageNet

Accuracy Clean Robust Clean Robust

PatchGuard 95.0% 86.7% 54.6% 26%

64 of 71

Page 66: PatchGuard: A Provably Robust Defense against Adversarial

Evaluation: Substantial Provable Robustness

1. PatchGuard achieves substantial provable robustness (robustness evaluated against a 2%-pixel square patch anywhere on the image)

Top-5 accuracy for ImageNet is good!

10-class ImageNette 1000-class ImageNet

Accuracy Clean Robust Clean Robust

PatchGuard 95.0% 86.7% 54.6% 26%

PatchGuard-Top-5

-- -- 76.6% 56.9%

65 of 71

Page 67: PatchGuard: A Provably Robust Defense against Adversarial

Evaluation: State-of-the-art Clean Accuracy and Provable Robust Accuracy

2. IBP is too computationally expensive to scale to high-resolution images3. PatchGuard significantly outperforms CBN and DS

• Improvement from CBN on ImageNet:• 5% clean accuracy; 19% provable robust accuracy (2x better!)

• Improvement from DS on ImageNet:• 10% clean accuracy; 12% provable robust accuracy (1x better!)

10-class ImageNette 1000-class ImageNet

Accuracy Clean Robust Clean Robust

PatchGuard 95.0% 86.7% 54.6% 26%

IBP [1] Computationally infeasible

CBN [2] 94.9% 60.9% 49.5% 7.1%

DS [3] 92.1% 79.1% 44.4% 14.4%

[1] Chiang et al., “Certified Defenses for Adversarial Patches,” ICLR 2020[2] Zhang et al., “Clipped bagnet: Defending against sticker attacks with clipped bag-of-features,” DLS Workshop 2020 [3] Levine et al., “(De)randomized smoothing for certifiable defense against patch attacks,” NeurIPS 2020 66 of 71

Page 68: PatchGuard: A Provably Robust Defense against Adversarial

Discussion: Generalizability of PatchGuard

PatchGuard as a general defense framework

Provably Robust Defense Small receptive field Secure feature aggregation

PatchGuard (ours) BagNet Robust masking

67 of 71

Page 69: PatchGuard: A Provably Robust Defense against Adversarial

PatchGuard as a general defense framework

Provably Robust Defense Small receptive field Secure feature aggregation

PatchGuard (ours) BagNet Robust masking

Clipped BagNet (CBN) [1] BagNet Clipping + Average pooling

Derandomized Smoothing (DS) [2] Pixel patches to ResNet Majority voting

Discussion: Generalizability of PatchGuard

68 of 71

Page 70: PatchGuard: A Provably Robust Defense against Adversarial

[1] Zhang et al., “Clipped bagnet: Defending against sticker attacks with clipped bag-of-features,” DLS Workshop 2020 [2] Levine et al., “(De)randomized smoothing for certifiable defense against patch attacks,” NeurIPS 2020[3] Metzen et al., “Efficient certified defenses against patch attacks on image classifiers,” ICLR 2021[4] Lin et al. “Certified robustness against physically-realizable patch attack via randomized cropping,” ICLR Open Review 2021

PatchGuard as a general defense framework

Provably Robust Defense Small receptive field Secure feature aggregation

PatchGuard (ours) BagNet Robust masking

Clipped BagNet (CBN) [1] BagNet Clipping + Average pooling

Derandomized Smoothing (DS) [2] Pixel patches to ResNet Majority voting

BagCert [3] Modified BagNet Majority voting

Randomized Cropping [4] Cropped images to ResNet Majority voting

Discussion: Generalizability of PatchGuard

69 of 71

Page 71: PatchGuard: A Provably Robust Defense against Adversarial

Discussion: Limitations

1. The small receptive field hurts the clean accuracy (provable robustness vs. clean accuracy trade-off)• The accuracy drop is especially obvious for ImageNet (from 76.1% to 56.5%)

2. The masking operation requires additional parameters (e.g., number of masks, mask size, mask shape)

10-class ImageNette 1000-class ImageNet

Clean Robust Clean Robust

ResNet-50 (483 × 483)

99.6% -- 76.1% --

BagNet-17(17 × 17)

95.9% -- 56.5% --

PatchGuard 95.0% 86.7% 54.6% 26%

PatchGuard-Top-5

76.6% 56.9%

70 of 71

Page 72: PatchGuard: A Provably Robust Defense against Adversarial

Takeaways

1. PatchGuard: a General Defense Framework

• Small receptive field

• Secure feature aggregation

2. Provably Robust Defense

• Predictions are always correct on certified images

3. State-of-the-art Defense Performance

• Clean accuracy

• Provable robust accuracy

71 of 71

Page 73: PatchGuard: A Provably Robust Defense against Adversarial

Thank you!

Chong XiangPrinceton University

[email protected]

Prateek MittalPrinceton University

[email protected]

Vikash SehwagPrinceton University

[email protected]

Arjun Nitin BhagojiUniversity of Chicago

[email protected]

Technical Report GitHub