Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
02468101214
1 6 11 16 21 26 31
NumberofWordsChanged
HistogramoverAdversarialExamples
CraftingAdversarialAttacksonRecurrentNeuralNetworks(RNNs)MarkAnderson,AndrewBartolo,Pulkit Tandon
{mark01,bartolo,tpulkit}@stanford.edu
Summary Models
Data&Features
IntuitiveBlack-BoxAdversaries•RNNsareusedinavarietyofapplicationstorecognizeandpredictsequentialdata.However,theyarevulnerabletoadversaries; e.g.,acleverly-placedwordmaychangethepredictedsentimentofamovie reviewfrompositivetonegative.•WebuiltNaïveBayes,SVM,andLSTMmodelstopredictmoviereviewsentimentandbuilttwoblack-boxadversaries.WeshowthatNBandSVMaresensitivetotheseattackswhileLSTMsarerelativelyrobust.•Finally,weimplementedarecentJacobian-basedtechniqueforgeneratingadversariesforLSTM,andfoundthatLSTMperformancefallsbelow40%byreplacinganaverageof8.7words.Wealsofoundexampleswheretheclassificationerrorwasbroughtonbyaseemingly-randomword,indicatingthattheLSTMmightnotbetrulylearningsentiment.
Wetrainonapre-labeledsetof12,500positiveand12,500negativemoviereviews,collectedfromIMDb[1].Reviewsaveraged233words.ForcompatibilitywiththeNumPy andTensorFlow inputmodels,SVMandLSTMreviewsarecappedat250words.Westripallpunctuationfromthereviews,butleavestopwords.
Trainingaccuracyvs.#iterations,64- and128-hidden-unitLSTM
TheWord2Vec+LSTMarchitecture[3]• Single-LayerRNNwithLSTMs• LinearSVM• NaïveBayeswithLaplace
Smoothing
PCArunoverthedataset.
Features:1. Bag-of-Words
One-hotvector– sizeofthedictionary(400kwords).UsedforNaïveBayesandSVMmodels.
2. WordVectors[2]Pre-determinedembeddingin50-dimensionalspace.UsedforLSTMmodel.
Analysis
JacobianSaliencyMapAdversary[3]
Input:f,�⃗�, 𝐷Algorithm:1. y:=f(�⃗�)2. 𝑥∗ :=�⃗�3. 𝐽' �⃗� 𝑦 =
)*+),⃗
4. whilef(𝑥∗)==y:5. selectawordi insequence𝑥∗6. 𝑤 ∶= 𝑎𝑟𝑚𝑖𝑛5⃗67 𝑠𝑖𝑔𝑛 𝑥∗ − 𝑧 − 𝑠𝑖𝑔𝑛 𝐽' �⃗� 𝑖, 𝑦7. 𝑥∗[𝑖] :=𝑤8. end9. return𝑥∗
�⃗�[2]=movie
−𝑠𝑖𝑔𝑛 𝐽' �⃗� 𝑖, 𝑦
0
𝑥∗[2]�⃗� =“Themovieisterrific”
y=Pos
y=Neg
TruePositive 54 65
TrueNegative 110 62
f:PredictionModelX:ExampleSentenceD:Dictionary
[1]A.Maas,R.Daly,P.Pham,D.Huang,A.Ng,andC.Potts,“LearningWordVectorsforSentimentAnalysis,”InProc.ofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,‘06,2011,pp.142-150.[2]A.Deshpande,“SentimentAnalysiswithLSTMs,”Oct.3,2017.[Online].Available:https://github.com/adeshpande3/LSTM-Sentiment-Analysis.[3]N.Papernot,P.McDaniel,A.Swami,andR.Harang.“CraftingAdversarialInputSequencesforRecurrentNeuralNetworks.”Apr.28,2016.
References
HistogramofAdversarialSamplesAverage#WordsChanged:8.7
89.24%
80.98% 69.64%
64.27%
31.66%
14.05%
98.19%
86.51% 81.65% 80.49%
54.34%
32.95%
94.36%
81.77% 79.08%
77.76% 68.17%
59.36%
39.86%
0%
20%
40%
60%
80%
100%
Training Testing(noadversary)
Testing,tack-on Testing,1-strongest
Testing,3-strongest
Testing,5-strongest
Testing,JSMA
ModelAccuracyvs.Adversary
NaïveBayes SVM LSTM
• SVMandNBperformsimilarlytoLSTMonthetestsetwithoutadversary.Thisimpliesthedataiswell-segregated- independentlyseeninPCAplot.
• TheLSTMismostrobusttoourblack-boxadversaries.• Black-boxadversarieswerewordsstronglyassociatedwithsentiment.• Modelaccuraciesfellmonotonicallywithincreasingadversarystrength.• Jacobian-basedmethodsdonotalwayschangethemostpositive/negative
words.Seemingly-randomwordinjectionchangestheprediction,leadingustoquestionwhetherLSTMsareactuallylearningthesentiment;e.g.:Thisexcellentmoviemademecry!→ thisexcellenttsunga telsim grrr cry
• ImplementadeeperLSTMwithmean-poolinglayers• OptimizedmemoryallocationinTensorFlow codeforJSMAmethod• AdversarialtrainingofLSTMnetworkbasedonJSMAadversaries• UseStanfordNLPParsertoautomategrammarchecking
FutureWork
• BasedonNaïveBayes”strongest”words– wordsmostpolarizingtowardpositiveornegativeclassification
• AdversarialWords:• PositiveSway:”edie,”“antwone,”“din,”“gunga,””yokai”• NegativeSway:“boll,”“410,”“uwe,”“tashan,”“hobgoblins”
• Tack-On:replacefirstwordwithrandomadversarialword• NStrongest-Word-Swap:replacereview’sNstrongestword(s)withrandom
adversarialword(s);experimentedforN<=5
Weperformedahyperparameter searchandsettledonanLSTMwithasoftmax outputlayerand64hiddenunits.ForthelinearSVM,wesweptlearningrateandtrieddifferentfeaturesandkernels.TheNaïveBayesmodelismultinomialanduseslog-probabilities.
AccuracyafterJSMA=39.9%