View
71
Download
2
Embed Size (px)
Citation preview
THE RISE OFDGA MALWARES
ENRICO HUGO, S.KOM. , CEH
IDNOG 4TH CONFERENCE | 27 JULY 2017 | JAKARTA, INDONESIA
AGENDA
• Distributed Denial of Service
• Botnet Architectures
• Domain Generation Algorithm
• DGA Detection Techniques
• Reverse Engineering
• Zipf’s Law
• Maximum Consonant Sequence Length
• Hierarchical Clustering
DISTRIBUTED DENIAL OF SERVICE
• DDoS is the current threat as seen on recent news on cyber attacks
• Mirai, for example, employs millions of infected network devices to perform DDoS
• These devices form a network of zombies or bots, so-called “botnet”
• The botnet(s) is/are controlled by a person or a group of people known as “botmaster(s)”
• Botmasters issue commands to the botnet after the bots have successfully established connections to the Command-and-Control (C&C) server(s)
BOTNET C&C LOOKUP
• Botnet establishes connection with its C&C server by first looking up the IP address of its C&C server
• Regardless of its architecture / topology, botnets mostly use fluxing
• There are two types of fluxing:
• IP Flux
• Domain Flux
IP FLUX
• A single Fully Qualified Domain Name (FQDN) associated with many constantly-changing IP addresses
• There are two types of IP Fluxing techniques:
• Single Flux
• Double Flux
DOMAIN FLUX
• Many FQDNs resolve to a single IP address
• Most of the time this IP address is the IP address of the proxy, not the actual C&C server
• One of the most popular techniques nowadays is the Domain Generation Algorithm (DGA)
DEFINITION
Domain generation algorithms (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers.
CHARACTERISTICS
• NXDOMAIN responses
• Usually random on the 2LD or 3LD domains
• A lot of requests from the same IP address
• Ranges from completely unreadable words (not compliant to Zipf’s Law) to dictionary words (harder to detect).
MALWARES USING DGA
• Kraken
• Conficker
• Gameover Zeus
• Pykspa
• Cryptolocker
• Dyre
• Darkshell
• Locky
• Mad Max
• PandaBanker
• Pushdo
• Ramnit
• Srizbi
• Torpig
• Virut
• etc.
DGA DETECTION TECHNIQUES
• Reverse Engineering (Generating Regular Expressions for DGA Detection)
• Zipf’s Law (Detecting the Existence of DGA within Log Files)
• Maximum Consonant Sequence Length (Detecting the DGA within Log Files)
• Hierarchical Clustering (Clustering Log Files)
DGARCHIVE
• Daniel Plohmann, Khaled Yakdan, Michael Klatt, Johannes Bader, and Elmar Gerhards-Padilla published a paper entitled “A Comprehensive Measurement Study of Domain Generating Malware” in which they discussed the many different categories of malware DGAs.
• In addition, they also managed to create DGArchive, a repository of DGA regexes from 69 malware families obtained by reverse engineering malware samples.
• Using the regexes, it is possible to generate list of AGDs for the current day to be used as a blacklist before the DGA attack even started.
DRAWBACK OF REGEX
• The regex provided by DGArchive is too generic
• For example, the DGA regular expression of Darkshell is [\s\S]{6}\.com and google.comfits into the regex
• Some other detection measures are necessary
ZIPF’S LAW
Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word.
N-GRAM FREQUENCIES
Let’s take facebook.com as an example:
• Unigrams = [‘f’, ‘a’, ‘c’, ‘e’, ‘b’, ‘o’, ‘o’, ‘k’, ‘c’, ‘o’, ‘m’]
• Bigrams = [‘fa’, ‘ac’, ‘ce’, ‘eb’, ‘bo’, ‘oo’, ‘ok’, ‘co’, ‘om’]
• Trigrams = [‘fac’, ‘ace’, ‘ceb’, ‘ebo’, ‘boo’, ‘ook’, ‘com’]
The bigram frequency:
• fa = 1
• ac = 1
• ce = 1
• eb = 1
• bo = 1
• oo = 1
• ok = 1
• co = 1
• om = 1
The unigram frequency:
• f = 1
• a = 1
• c = 2
• e = 1
• b = 1
• o = 3
• k = 1
• m = 1
BIGRAM FREQUENCY OF LOG FILE
Given a DNS Log File containing
a list of domain names as follows:
• google.com
• facebook.co.id
• apple.com
• youtube.com
• klikbca.com
• twitter.com
• detik.com
• co = 7
• om = 6
• ik = 2
• le = 2
• oo = 2
• ac = 1
• ca = 1
• it = 1
• ce =1
The sorted bigram frequencies would be:
• ap = 1
• go = 1
• et = 1
• gl = 1
• er = 1
• pp = 1
• tw = 1
• tt = 1
• tu = 1
• li = 1
• ti = 1
• te = 1
• pl = 1
• be = 1
• de = 1
• yo = 1
• bc = 1
• bo = 1
• wi = 1
• fa = 1
• eb = 1
• kb = 1
• ok = 1
• og = 1
• ut = 1
• kl = 1
• ou = 1
• ub = 1
• id = 1
CONVERTING FREQUENCIES TO FREQUENCY RATIOS
• There are 38 distinct bigrams in the given DNS log file
• The total of all 38 bigram frequencies are 52
• The most frequent bigram frequency is 7, equalling to 7/52 times in the log file
• The least frequent bigram frequency is 1, equalling to 1/52 times in the log file
• Therefore the max and min bigram frequency ratio is 0.1346 and 0.0192 respectively
AGD VS HGD
• From the graphs, it is seen that Algorithmically-Generated Domains (AGD) such as the Conficker and Pykspa worm domains, generate a relatively straight line graph while Human-Generated Domains (HGD) like Alexa’s Top 500 sites produce an elbow-shaped graph .
• This observation leads to the creation of a formula for calculating the probability of a given log file containing DGA domains or incurring a DGA attack. The higher the DGA probability rate, the higher the possibility of an ongoing DGA attack within the monitored log.
DISCOVERING DGA WITHIN LOG FILES
• Further observation on the polluted log file (identified using Zipf’s Law) reveals one of the most prominent DGA characteristics that allow us to distinguish AGDs from HGDs better, i.e. Maximum Consonant Sequence Length. Generally, AGDs has a larger value of MCS Length compared to HGDs.
• Example:
• google.com has a maximum consonant sequence length of 2, since the longest consonant sequence is “gl”
• vofwxlbi.cn, one of the domains generated by Conficker worm, has a Maximum Consonant Sequence Length of 5 and the longest sequence is “fwxlb”
FEATURES
Level 1
• Query Class
• Query Type
Level 2• Response Code
Level 3
• Query Length
• Numeric Chars
Level 4• Query Label
Level 5• Numeric Chars
ACCURACY OF DETECTION
\
• Calculating the Accuracy using the formula below, the number 0.913 or 91% accuracy is obtained
COUNTERMEASURES – DNS RPZ• Obtain daily DGA log file from http://data.netlab.360.com/feeds/dga/dga.txt
• Parse using dnsanalysis library in Python
• Export to text file and implement into DNS RPZ
REFERENCES
• Botnet Communication Topologies
https://www.damballa.com/downloads/r_pubs/WP_Botnet_Communications_Primer.pdf
• A Comprehensive Measurement Study of Domain Generating Malware
https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_plohmann.pdf
• DGArchive – A deep dive into domain generating malware
https://www.botconf.eu/wp-content/uploads/2015/12/OK-P06-Plohmann-DGArchive.pdf
• Using DNS RPZ to Block Malicious DNS Requests
https://blogs.cisco.com/security/using-dns-rpz-to-block-malicious-dns-requests