Upload
yandex
View
1.996
Download
0
Tags:
Embed Size (px)
DESCRIPTION
В докладе речь пойдёт о применении алгоритмов машинного обучения для обнаружения вредоносных приложений для Android. Я расскажу, как на базе Матрикснета в Яндексе был спроектирован высокопроизводительный инструмент для решения этой задачи. А также продемонстрирую, в каких случаях аналитические методы выявления вредоносного ПО помогают блокировать множество простых образцов вирусного кода. Затем мы поговорим о том, как можно усовершенствовать такие методы для обнаружения более хитроумных вредных программ.
Citation preview
1
2
Fast detection of Android malware
Yury Leonychev
3
Introduction
4
Android application
APK
Manifest
(AndroidManifest.xml)
Code (Classes.dex and
native)
Meta information
(META-INF)
Resources (files and
Resources.arsc)
5
Brief list of tools for APK analysis
! Androguard (ultimate tool by @adesnos and others) – used by VirusTotal, APKInspector, etc.
! SCanDroid (Adam P. Fuchs, Avik Chaudhuri, and Jeffrey S. Foster)
! TaintDroid (guys from Intel, Penn State University, Duke University)
! DroidBox (dynamic analysis by Lantz Patric) – used by ApkScan
6
Is this all? Really?
! http://www.apk-analyzer.net ! http://anubis.iseclab.org ! http://apkscan.nviso.be
7
Our task is more complex
Malware detector
8
Methods of malware detection
Static analysis ! Advantages
– APK has predictable content. Application behavior can be learned by simply reading the file
– Checks are safe ! Limitations
– Can be ineffective for sophisticated malware and obfuscation techniques – We cannot really tell as we don't execute app
9
Methods of malware detection
Dynamic analysis ! Advantages
– Clear results and interpretation
– Open source solutions available
! Limitations
– Not fast (enough)
– Can be detected and bypassed
– Big ecosystem requires big infrastructure
10
Methods of malware detection
Signature analysis ! Advantages
– Effective for known malware – Commercial solutions available ! Limitations
– Signature databases requires regular (and frequent) updates – Not effective for new malware – Do you have a team of virus analytics?
11
Methods of malware detection
Seems like the most efficient way is hybrid solution
12
MatrixNet
What is The Matrix?
13
Why can we use machine learning?
Abstract task description: ! We have a set of objects (APK-files). We should divide this set into two
subsets (malware and normal)
! For every element in main set we can count predictable amount of features
! Subsets – only result of simple classification task, so we can try to choose effective features
14
What is the MatrixNet?
MatrixNet is an implementation of gradient boosted decision trees algorithm MatrixNet is a bit different from standard: ! Using Oblivious Trees
! Accounting for sample count in each leaf
15
Why MatrixNet is powerful?
! This is machine learning algorithm for classification task
! A key feature of this method is it’s resistance to overfitting
16
MatrixNet post learning optimization
17
MatrixNet post learning optimization
Copyright © 2013 by Sidney Harris.
18
How it works?
Offline learning process: ! Choosing features
! Choosing samples
! Manual classification (malware or not)
! Learning on combined set of apps
! Calculating mistakes
19
Features
What kind of features to use: ! Permissions
! URI in strings and other resources
! Adware library usage
! Obfuscation methods
! …
20
Samples and classification
Malware applications: ! VirusTotal feed ! Samples from malicious sites
Normal applications: ! Manual testing ! Trusted developers ! Yandex applications
21
Formula
Features weight
Features cost
Learning
Normal
Malware
MatrixNet Features
22
Measuring of mistakes
Formula 1
Features cost 1
Formula N
Features cost N
Normal
Malware
Formula with cool confusion matrix and low cost
23
Analyzer architecture
Fine! I'll go build my own casino, with blackjack and big data
24
Main parts
Parsers Analyzers
Oracle Report
25
Parsers
In depth APK
ManifestParser ResourceParser MetaInfoParser ClassesParser
Analyzers
PermissionAnalyzer PackageAnalyzer URLAnalyzer ReflectionAnalyzer
Reports
XHTMLReporter JSONReporter
Oracle
MatrixNet
26
ManifestParser
Avoid some obfuscation methods: ! HEUR:Backdoor.AndroidOS.Obad.a
27
<?xml version="1.0" encoding="utf-8"?> <manifest ="singleTop" android:versionCode="2" ="2.0" android:installLocation="internalOnly" package="com.android.system.admin" xmlns:android="http://schemas.android.com/apk/res/android"> <uses-permission ="android.permission.READ_LOGS" /> <uses-permission ="android.permission.WAKE_LOCK" /> … <uses-permission ="android.permission.RECEIVE_SMS" /> <uses-permission ="android.permission.SEND_SMS" /> <uses-permission ="android.permission.CALL_PHONE" />
ManifestParser
28
ClassesParser
! Parser for DEX files
! Internal DEX disassembler
! Callgraph builder
! Embeds “real” functions/variables names into disassembly listing
! Builds a list of used procedures and functions
29
ClassesParser Disassembler https://github.com/tracer0tong/de
Example: ./de.py test1.dex.dat
[[0, 'sget-object v0, {type} [{class}].{field} // field@2225'],
[2, 'invoke-virtual v0 @13970 // {class}->{method}'],
[5, 'move-result-object v0'],
[6, 'check-cast v0, [{type_name}] // type@0958'],
[8, 'return-object v0']]
30
ReflectionAnalyzer
java.lang.reflect.* ! Classes: Field, Method, etc. ! Functions: getClass(), getDeclaredField(), etc.
31
ReflectionAnalyzer
Output: ! Report:
There is some reflections usage: [email protected]>getContentResolver calls: [email protected]>forName [email protected]>onActivityResult calls: [email protected]>forName ! Amount of reflection calls is a feature.
32
Service architecture
Nginx
Gunicorn
Flask
Celery
MongoDB
Nginx
Gunicorn
Flask
Celery
MongoDB
33
Case study
34
Let's try it on...
Yandex.Store application feed: ! More than 50K Android applications
! More than 200 new/updated apps per week
! Open for developers (no strict manual verification)
35
Perfomance. Check timing
~2 ms
~0,25 s
~4,5 min
36
Performance. Amount of checks
! More than 16.000 applications checked in 1 hour on 1 cluster node
37
Confusion matrix
Meaning
Malware (Score > 0) Normal (Score < 0)
Fact Malware 485 (97%) 15 (3%)
Normal 25 (5%) 475 (95%)
38
(Un)predictable results
! Applications with malicious adware library AirPush classified as malware
! But we have no special features for adware in first version
39
Conclusion
It’s alive… alive!
40
It works!
! Analytic methods work fine for detection Android mobile malware
! Machine learning is not a “rocket science” but cool and effective instrument
! Open API coming soon.
41
Thanks for attention