Upload
vuongque
View
222
Download
0
Embed Size (px)
Citation preview
AIBasedAn*virus:Detec*ngAndroidMalwareVariantsWithaDeepLearningSystem
ThomasLeiWang@thomaslwang
About me • My first (boring) job was a virus analyst in 2004. • I had a dream…
Virus Analysis VS Image Recognition
Experiencedvirusanalystsome*mesisdoingimagerecogni*on!
ImageProvidedbytheMNISThandwriHendatabase
Sample increase VS signature efficiency decrease
NumberofMaliciousAndroidApps Dowgin:ARichVariantsAndroidAdwareFamilyNewDowginSamplesVSAverageDowginSamplesHitPerSignature
Maliciousapps,DowginsamplesandDowginsignaturesarecountedfromourdatabase.
Signaturebasedrules
Behavioralbasedrules
Opcodebasedrules
AIbaseddeeplearningsystem
Our evolution
Training FeatureExtrac+on
• Structuraltype
• Sta*s*caltype • Empiricaltype • Continuousvalue
• 0-1value
FeatureNormaliza+on
• Standardscorenormaliza*on
• CuVngtechnique
• Quan*lenormaliza*on
TraininginDeepNeuralNetwork
• PaddlePaddleplaYorm
• Residuallayer • AutoEncoder • Configura*ontunings
Models
• Malwaremodel
• PUAmodel
InputAPKfeatures Model Output
Prediction
APK
Feature extraction
Structuralfeatures• Numofuses-permissonsinAndroidManifest• Numberofpicturefilesin/res• Sizeof/res• NumberofclassesstartswithLcom/• NumofclassesstartswithLjava/• Numoffieldstypeboolean• Numofmethodswhichhasparameters>20
Empiricalfeatures• Hasexecutablefilein/res• Hasapkfilein/assets• RegisterDEVICE_ADMIN_ENABLEDbroadcastandhassendSMSMessagepermission
Sta*s*calfeatures• Countcer*ficatefieldsinsamplestoget100stringswithdiscrimina*[email protected]/benign=52
205 3 34.5 143234
285 68 296 7
13850 157 11218 847
1.23e+9 422 1004 177
0 398 13.333 125
Numeraliza*on(N=1235)
Con*nuousvalue(N=571)
0-1value(N=664)
0 0 0 0
0 0 0 1
0 1 0 0
0 1 1 0
0 1 0 0
‘’’’’’’’’’’’’’
‘’’’’’’’’’’‘
CuVngtechnique
Feature normalization
Con*nuousvalue
Gaussiandistribu*on
[-1,1]
Standardscorenormaliza*on
Tomakefeaturesmorediscrimina*vePrecisionincreasedby9%
Noiseproblem
Long-taileddistribu*on
Quan*lenormaliza*on
CuVngtechnique
Mul*modaldistribu*on
Tanh ReLU
Training in deep neural network
AutoEncoder
Configura*ons:• Hiddenlayerac*va*onfunc*on:TanhandReLU• Costfunc*on:Mul*classcrossentropy• Learningmethod:ADADELTA• Finallayerac*va*onfunc*on:Sormax• Passes:20–30
Inputlayer
NormalizedCon*nuousvalue(n=571)
0-1value(n=664)
Hiddenlayer1
Hiddenlayer2 Hiddenlayer3 Residuallayer iden*ty
outputlayer
n=256
n=256
ReLU
ReLU
Tanh
n=256 n=256
Sormax
n=256
Tanh Tanh
NetworkArchitecture
0
1
TrainedonPaddlePaddleplaYormwith15M+samples
Prediction & Evaluation
Models
ExtractapkfeaturesonthephonePerf:140ms/apkTraffic:1kB/apk
Sendfeaturestothecloud
Returnpredic*ontothephone
Predic*on
Predictinthecloud
Apkfeatures
Produc+ondeployment
0.880.890.90.910.920.930.940.950.960.97
Jan2016 Mar2016 May2016 Jul2016
Thelife*meofmodeltrainedonJan2016
ThemodelistrainedonJan2016andtestedagainstAV-TESTJan,Mar,MayandJuly’ssamples.Recallratedroppedby7.6%in6months.
0.95
0.955
0.96
0.965
0.97
0.975
0.98
0.985
0.99
0.995
0 0.02 0.04 0.06 0.08 0.1
Detec*onperformanceasROCcurve
ROCcurveistestagainstAV-TESTJuly’ssamples:7613Androidmalware,3020legi*mateAndroidapps,total10633.
Detec+onperformance
True
Posi*veRate
FalsePosi*veRate
Modellife+me
Recall
Limitations • Can’t provide explanations
for its detection results • Can’t understand code
meaning. • Build on static analysis and
lack of dynamic inspection. • Can’t self learning, need
continuous training with labeled data.
Advantages • More difficult to evade • Fixed-size
Conclusion • Feature extraction is the key step
• Virus analyst experience can help to find valuable features. • AutoEncoder neural network can be used to extract the most valuable
features from a large number of features. • This system is designed to detect Android malware, but these
methods can also be used in detecting malware in other platforms.
• Our system learns in image recognition way. It’s effective only in detecting malware variants.
Thank you • Welcome contact me
• Twitter: @thomaslwang • Email: [email protected]
• Welcome cooperation and partnership with us • Acknowledgement
• Baidu IDL: Lyv Qin, Xiao Zhou, Jie Zhou, Errui Ding, Yuanqing Lin, Andrew Ng
• Partner: Liuping Hou, Jinke Liu, Zhijun Jia, Yanyan Ji • PaddlePaddle platform http://paddlepaddle.org