Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Anomaly Detection Algorithms for
Malware Traffic Analysis using Tamper
Resistant Features
Dr. Patrick McDaniel
Berkay Celik
Fall 2015
Page
‣ Introduction
‣ Motivation
‣ Related Work
‣ Data
‣ Approach
‣ Experimental Results
‣ Comparison with Previous Work
‣ Conclusion and Discussion
‣ References
2
Page
Malware Infection
3Image credit: http://www.vblaze.com/
Page
Malware/Legitimate Communication
4
Packet Packet
Do features extracted from packet headers
discriminate legitimate applications from malware
traffic ?
• How many packets should be aggregated for feature
extraction?
• Which feature subset should be used for detection?
Page
Goal:
‣ Focus on detecting malware heartbeat traffic
‣ Features should be tamper resistant (i.e., not easy to fool
such as port numbers or flags in packet headers)
‣ Malware traffic is rare, evaluation of anomaly detection
algorithms
5
To analyze and detect the network-levelbehavior of malware traffic after blendinginto the normal traffic:
Page
Current state of the art
• Systems based on known signatures: Well studied, adrawback of these systems is not detecting unknownmalware traffic
• Payload inspection: Vulnerable to privacy issues,payload encryption and limitations in processing high-speed (multigigabit) networks
• Feature representation: Drawbacks in selecting“tamper-proof” features such as using port numbers,payload information, protocol specific information andunrealistic malware traffic features when modelling thetraffic
• Supervised classification algorithms: Therequirement of targeted anomalous samples is adisadvantage of these approaches
6
Related Work
Page
‣ Legitimate Traffic features traces of
a small scale organization network
recorded at University of Twente
with around 35 employees and over
100 students 7
Dataset
Legitimate Traffic,
7753 x13
instances
Malware Traffic
3513X13 instances(as a total 16 different
malware families)
Image credit: http://www.vblaze.com/
Page
Feature space (13 features, all continuous):
8
• Flow duration: Difference between last packet time and first
packet time
• Count of Payload (+): The count of all the packets with at
least a byte of data payload
• Min data size (+): Minimum payload size observed
• Mean of bytes (-): Data bytes divided by the total number of
packets
• Initial Data Length (*): The total number of bytes sent in initial
window
• RTT samples (*): Total number of RTT samples found in total
packets
• Median and Variance of bytes (+): Median and variance of
total packet bytes
• IP ratio(*): Ratio between the maximum packet size and
minimum packet size
• Goodput(*): Total number of frame bytes divided flow
duration
Page
Feature selection:
9
These papers are the guidelines for the feature
selection process:
• Wei Li, Marco Canini, Andrew W Moore, and Raffaele Bolla.
Efficient application identification and the temporal and
spatial stability of classification schema, Computer Networks,
2009
• A. Moore, D. Zuev, and M. Crogan. Discriminators for use in
flow based classification. Queen Mary and Westfield
College, Department of Computer Science, 2005
• Terry Nelms, Roberto Perdisci, and Mustaque Ahamad.
Execscent: Mining for new C&C domains in live networks with
adaptive control protocol templates. In USENIX Security,2013
Page
Approach
Overview of Framework
10
Steps to achieve the goal
Page
Approach
• One-class support vector machine (OCSVM)
• The distance to the kth nearest neighbor (k-NN)
• K-means clustering by finding the distance from data to the
nearest cluster centre
• Least squares anomaly detection (LSAD) based on the least squares probabilistic classifier
11
Steps to achieve the goal
Image from official Scikit-learn, One-class SVM
Page
Approach
Evaluation Metrics
• AUC (Area Under Curve)
• ROC curve when necessary
• Further experiments for analysis of malware
traffic
• Confusion matrix
• False positive and false negative counts• Interpretation of PCA and K-means clustering
12
Steps to achieve the goal
Page
Experimental Setup:
• Hyper parameters are set using the subset
of the training set
• Stratified k-fold cross validation (k is
selected depending on the malware traffic
size) or random sampling is applied
depending on the number of malware
instances
• A paired t-test with significance level 0.05 to
report the differences of each algorithms'
AUC values
13
Steps to achieve the goal
Page
Experimental Results
14
Steps to achieve the goal
Page
Experimental Results:
• Avg. ROC plots the percentage of correctly
classified malicious samples (true positive
rate) against the percentage of legitimate
samples falsely classified as malicious (false
positive rate)15
Steps to achieve the goal
(More details of ROC curve for each fold is given in report)
Page
Experimental Results:
• ROC plots with cross validation the percentage of correctly
classified malicious samples (true positive rate) against the
percentage of legitimate samples falsely classified as malicious
(false positive rate)
16
Steps to achieve the goal
Kaiten vs Neris malware (More details of ROC curve for each
fold is given in report)
Page
Lessons Learned from initial results:
• No single algorithm performs better than
others
• Detection Results decrease with the recent
evolution of malware families e.g., Zeus V1
to Zeus V2
• Recent malware traffic gets stealthy, and
evades the detection (disguising traffic)
17
Steps to achieve the goal
Page
Understanding source of false negatives and false
positives:
Number of malware flows classified as legitimate HTTP(S)
18
Steps to achieve the goal
• Mean Values, std is in range +/- 0.53 for all families
• Port numbers as a ground truth labels
• C4.5 algorithm for classification
Number of legitimate HTTP(S) flows classified as malware
Page
Detailed Analysis:
Confusion Matrix after cross validation
19
Steps to achieve the goal
Base Classifier (majority class) vs. C4.5 algorithm(More details are given in report)
Page
Network Behavior of Malware Families:
20
Steps to achieve the goal
Log scale plot of incoming and outgoing ratio of packet bytes
• Most similar HTTP traffic observed between malware
and legitimate traces, from constant packet ratio to
varying packet ratio
Page
Analysis of Feature Space of Malware (Code Reuse):
21
Steps to achieve the goal
Feature Projection to two Dimensional Space using PCA and K-means Clustering
• Tbot and Kaiten are close to each other, and form a single
cluster. However, Agabot is not as close as the other malware
families. Zeus V1, ZeusGameover, ZeusPonyloader, ZeusV2 and
Sality form in similar feature range, and most of their instances
are assigned to the same clusters
Page
Recent papers:
• Looks for the multiple source of information i.e., features extracted
from not only packets, but also IP addresses, DNS features, HTTP
requests etc.
‣ T. Nelms, R. Perdisci, and M. Ahamad. Execscent: Mining for new
C&C domains in live networks with adaptive control protocol
templates. In Proc. USENIX Security Symposium, 2013
• Focusing on before infection phase, we assume that hosts
are already infected and generates traffic. More
challenging...
‣ L. Invernizzi, S.-J. Lee, S. Miskovic, M. Mellia, R. Torres, C.
Kruegel,
S. Saha, and G. Vigna. Nazca: Detecting malware distribution in
largescale networks. In Proc. Network and Distributed System
Security Symposium (NDSS), 2014
• Detection Accuracy is mostly high due to the use of tamper
proof features
‣ Port numbers, flags and payload is used22
Steps to achieve the goal
Page
• Presented a framework that evaluates the detection performance of
malware heartbeat traffic after blending into legitimate applications
• Our framework effectively discriminates most of the C&C heartbeat
traffic from legitimate traffic by only using tamper resistant features
of transport layer protocol
• We observe substantial decrease in detection with the recent
malware families
‣ Malware traffic is disguised in HTTP traffic to conduct an
evasion attack
• Code reuse is common practice in malware families
• Provide a discussion of importance of using tamper resistant
feature space, and multiple source of information to alleviate the
false negatives by improving the underlying feature space
23
Steps to achieve the goal
Conclusion/Discussion
Page
Key Papers
F.Kocak, D. J. Miller, and G. Kesidis. Detecting anomalous latent
classes in a batch of network traffic flows. In Proc. Information
Sciences and Systems (CISS), 2014
24
Steps to achieve the goal
Wei Li, Marco Canini, Andrew W Moore, and Raffaele Bolla. Efficient
application identification and the temporal and spatial stability of
classification schema, Computer Networks, 2009
Feature Selection:
Methodology and Insights:
Anomaly Detection Algorithms:
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A
survey. In ACM Computing Surveys, 2009.
State of the art paper in this research area:
Gu, R. Perdisci, J. Zhang, W. Lee, et al. Botminer: Clustering analysis of
network traffic for protocol-and structure-independent botnet
detection. In Proc. USENIX Security Symposium, 2008
QUESTIONSAnomaly Detection
Algorithms for
Malware Traffic
Analysis using
Tamper Resistant
Features
25