On Security and Sparsity of Linear Classifiers for Adversarial Settings

PatternRecognitionandApplications Lab

UniversityofCagliari,Italy

DepartmentofElectricalandElectronic

Engineering

On Security and Sparsity of Linear Classifiersfor Adversarial Settings

AmbraDemontis,PaoloRussu,BattistaBiggio,GiorgioFumera,FabioRoli

battista.biggio@diee.unica.it

Dept.OfElectrical andElectronicEngineeringUniversity ofCagliari,Italy

S+SSPR,Merida,Mexico,Dec.12016

http://pralab.diee.unica.it

Recent Applications of Machine Learning

• Consumer technologies for personal applications

iPhone 5s with Fingerprint Recognition…

… Cracked a Few Days After Its Release

EU FP7 Project: TABULA RASA

New Challenges for Machine Learning

• The use of machine learning opens up new big possibilitiesbut also new security risks

• Proliferation and sophisticationof attacks and cyberthreats

– Skilled / economically-motivatedattackers (e.g., ransomware)

• Several security systems use machine learning to detect attacks– but … is machine learning secure enough?

Classifier Evasion

Is Machine Learning Secure Enough?

• Problem: how to evade a linear (trained) classifier?

Start 2007 with a bang!Make WBFS YOUR PORTFOLIO’sfirst winnerof the year...

startbangportfoliowinneryear...universitycampus

11111...00

+6 > 0, SPAM(correctlyclassified)

f (x) = sign(wT x)

+2+1+1+1+1...-3-4

St4rt 2007 with a b4ng!Make WBFS YOUR PORTFOLIO’sfirst winnerof the year... campus

00111...01

+3 -4 < 0, HAM(misclassifiedemail)

f (x) = sign(wT x)

Evasion of Linear Classifiers

• Formalized as an optimization problem– Goal: to minimize the discriminant function

• i.e., to be classified as legitimate with the maximum confidence– Constraints on input data manipulation

• e.g., number of words to be modified in each spam email

min$% 𝑤(𝑥′𝑠. 𝑡. 𝑑(𝑥, 𝑥%) ≤ 𝑑34$

Dense and Sparse Evasion Attacks

• L2-norm noise corresponds to dense evasion attacks

– All features are modified by a small amount

• L1-norm noise corresponds to sparse evasion attacks

– Few features are significantly modified

min$% 𝑤(𝑥′𝑠. 𝑡. |𝑥 − 𝑥%|77 ≤ 𝑑34$

min$% 𝑤(𝑥%𝑠. 𝑡. |𝑥 − 𝑥%|8 ≤ 𝑑34$

Examples on Handwritten Digits (9 vs 8)

original sample

5 10 15 20 25

SVM g(x)= −0.216

5 10 15 20 25

Sparseevasionattacks(l1-normconstrained)

original sample

5 10 15 20 25

cSVM g(x)= 0.242

5 10 15 20 25

Denseevasionattacks(l2-normconstrained)

manipulated sample

Robustness and Regularization

• SVM learning is equivalent to a robust optimization problem

Robustness and Regularization[Xu et al., JMLR 2009]

minw,b

12wTw+C max 0,1− yi f (xi )( )

i∑ min

w,bmaxui∈U

max 0,1− yi f (xi +ui )( )i∑

1/margin classification error ontrainingdata(hinge loss) boundedperturbation!

Generalizing to Other Norms

• Optimal regularizer should use dual norm of noise uncertainty sets

l2-norm regularization is optimal against l2-norm noise!

Infinity-norm regularization is optimal against l1-norm noise!

minw,b

12wTw+C max 0,1− yi f (xi )( )

i∑ min

∞+C max 0,1− yi f (xi )( )

i∑ , w

∞=max

i=1,...,dwi

Interesting Fact

• Infinity-norm SVM is more secure against L1 attacks as it bounds the maximum absolute value of the feature weights

• This explains the heuristic intuition of using more uniform feature weights in previous work [Kolcz and Teo, 2009; Biggio et al., 2010]

Security and Sparsity of Linear Classifiers

Security vs Sparsity

• Problem: SVM and Infinity-norm SVM provide dense solutions!

• Trade-off between security (to l2 or l1 attacks) and sparsity– Sparsity reduces computational complexity at test time!

Elastic-Net Regularization[H. Zou & T. Hastie, 2005]

• Originally proposed for feature selection– to group correlated features together

• Trade-off between sparsity and security against l2-norm attacks

𝑤 9:;9< = 1 − 𝜆 𝑤 8 +𝜆2 𝑤 7

elasticnet l1 l2

Octagonal Regularization

• Trade-off between sparsity and security against l1-norm attacks

𝑤 BCD; = 1 − 𝜌 𝑤 8 + 𝜌 𝑤 F

octagonal l1 infinity(max)

Experimental Analysis

Linear Classifiers

• SVM– quadratic prog.

• Infinity-norm SVM– linear prog.

• 1-norm SVM– linear prog.

• Elastic-net SVM– quadratic prog.

• Octagonal SVM– linear prog.

minG,H

12 𝑤 7

7 + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

minG,H

𝑤 F + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

minG,H

𝑤 8 + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

minG,H

1 − 𝜆 𝑤 8 +𝜆2 𝑤 7

7 + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

minG,H

1 − 𝜌 𝑤 8 + 𝜌 𝑤 F + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

𝑓 𝑥 = 𝑤(𝑥 + 𝑏

Security and Sparsity Measures

• Sparsity– Fraction of weights equal to zero

• Security (Weight Evenness)– E=1/d if only one weight is different from zero– E=1 if all weights are equal in absolute value

• Parameter selection with 5-fold cross-validation optimizing:AUC + 0.1 S + 0.1 E

𝑆 =1𝑑 𝑤T|𝑤T = 0, 𝑘 = 1,… , 𝑑

𝐸 =1𝑑

𝑤 8

𝑤 F∈ [1𝑑 , 1]

Results on Spam FilteringSparse Evasion Attack

• 5000 samples from TREC 07 (spam/ham emails)• 200 features (words) selected to maximize information gain• Results averaged on 5 repetitions, using 500 TR/TS samples• (S,E) measures reported in the legend (in %)

0 10 20 30 400

1Spam Filtering

SVM (0, 37)

∞−norm (4, 96)

1−norm (86, 4)el−net (67, 6)8gon (12, 88)

maximumnumberofwordsmodifiedineachspam

Results on PDF Malware DetectionSparse Evasion Attack

• PDF: hierarchy of interconnected objects (keyword/value pairs)

0 20 40 60 800

1PDF Malware Detection

SVM (0, 47)

∞−norm (0, 100)

1−norm (91, 2)el−net (55, 13)8gon (69, 29)

maximumnumberofkeywordsadded ineachmaliciousPDFfile

/Type 2/Page 1/Encoding 1…

130obj<</Kids[10R110R]/Type/Page... >>endobj170obj<</Type/Encoding...>>endobj

Features:keywordcount

11,500samples5reps- 500TR/TSsamples

114features(keywords)selectedwithinformationgain

Conclusions and Future Work

• We have shed light on the theoretical and practical implications of sparsity and security in linear classifiers

• We have defined a novel regularizer to tune the trade-off between sparsity and security against sparse evasion attacks

• Future work– To investigate a similar trade-off for

• poisoning (training) attacks• nonlinear classifiers

?Any questionsThanks foryour attention!

Limited-Knowledge (LK) attacks

PD(X,Y)data

Surrogate training data

Send queriesGet labels

Learnsurrogate classifier

f’(x)

On Security and Sparsity of Linear Classifiers for Adversarial Settings

Education

Gaussian Classifiers

Advanced classifiers

Personalized classifiers

Separating and Classifiers

8. Classifiers - Max Planck Societypubman.mpdl.mpg.de/.../component/escidoc:1582926/Classifiers.pdf · 8. Classifiers 1. Introduction 2 ... and examples of classifiers in various

Sparsity Models - Tsinghuabigeye.au.tsinghua.edu.cn › DragonStar2012 › docs › sparsity.pdf · Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models

Adversarial Deep Learning Against Intrusion Detection ...1116037/FULLTEXT01.pdf · Adversarial Deep Learning Against Intrusion Detection Classifiers Maria Rigaki Information Security,

Bayes Classifiers

Evaluating Classifiers

Speaking naturally? It depends who is listening…media.speech.zone/images/Simon_King_Speaker_Odyssey_2018_key… · adversarial attack have been demonstrated against image classifiers

Ideology Classifiers for Political Speech Stefan Kaufmannccc.inaoep.mx/~villasen/bib/Ideology Classifiers for Political Speech.pdf · Ideology classifiers for political research pg

classifiers kelompok 1

Reflux Classifiers

Sparsity and its mathematics

Non Linear Classifiers

Linear Classifiers

Lecture 12 Classifiers Part 2 Topics Classifiers Maxent Classifiers Maximum Entropy Markov Models Information Extraction and chunking intro Readings: Chapter

Adversarial Deep Learning Against Intrusion Detection …ceur-ws.org/Vol-2057/Paper7.pdf · deep learning domain, against machine learning classifiers used for network intrusion detection

Deceiving Portable Executable Malware Classifiers into ...ghyan/papers/codaspy20.pdf · forts dedicated to creating adversarial malware samples for evading machine learning-based

Learning with Structured Sparsity