1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29

1

Detecting Malicious Flux Service Networks throughPassive Analysis of Recursive DNS Traces

Speaker: Jun-Yi Zheng2010/03/29

2

Reference

Roberto Perdisci, Igino Corona, David Dagon, and Wenke Lee. " Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces."ACSAC'09

3

Outline

Introduction System Architecture Experiments Conclusions

4

Introduction Fast-flux service networks(FFSNs)

a new ( ~2007) technique to maximize botnets availability

simple idea: add an additional indirection layer (i.e., proxy) between victims and controlling elements

a large number of proxy hosts (flux agent) are used to relay requests to the back-end server (mother-ship)

a decentralized botnet with constantly changing public DNS records

5

Fast-flux botnets Architecture

6

Characteristics of Flux Domain Names Short time-to-live (TTL) The set of resolved IPs returned at each query

changes rapidly The overall set of resolved IPs obtained by

querying the same domain name over time is often very large

The resolved IPs are scattered across many different networks

7

Approach Passive analysis of recursive DNS

Not only email spam and precompiled domain blacklists

Active probing may be detected by the attacker

Classify domains previous works, single domain names are

considered independently from each other

8

System Overview

9

Notation q(d) : a DNS query performed by a user at time ti to

resolve the set of IP addresses owned by domain name d Q(d)

i: the total number of DNS queries related to d ever seen until ti

T(d): the TTL of the DNS response Ť(d)

i: the maximum TTL ever observed for d P(d): the set of resolved Ips returned by the RDNS server prefix(P(d), 16) : the set of distinct /16 network prefixes

extracted from P(d)

R(d)i : the cumulative set of all the resolved IPs ever seen for d

until time ti

G(d)i: a sequence of pairs {(tj , r(d)

j)}j=1..i

where r(d)j = |R(d)

j | − | R(d)j −1|

10

Traffic Volume Reduction (F1) q(d) = (ti, T(d), P(d)) F1-a) T(d) <= 10800 seconds (i.e., 3 hours)

Because such domain names ( TTL >= 10800) are unlikely to be “fluxing”

F1-b) |P(d)| >= 3 OR T(d) <= 30 Because the uptime of each flux agent is not easily

predictable A large set of resolved IPs, or A very low TTL ( equal or close to zero )

F1-c) p = |prefix(P(d),16)| / |P(d)| >= 1/3 Flux agents are often across many different networks

and organizations

11

Periodic List Pruning (F2)

d = (ti , Q(d)i , Ť(d)

i , R(d)i , G(d)

i)

F2-a) Qi > 100 AND |G(d)i | < 3 AND

( |R(d)i | <= 5 OR p <= 0.5 ),

remove from a list of candidate flux domains domain names that do not pass F2 are very

unlikely to be related to flux services

12

Domain Clustering IP-based Domain Clustering

a number of fast-flux domain names all point to the same flux service

single-linkage hierarchical clustering algorithm Input: a similarity matrix; Output: a dendrogram The length of the edges represent the distance

between clusters

( ) ( )

( ) ( )

( ) ( ) min(| |,| |)

| | 1( , ) [0,1], 3

| | 1a b

a b

a b r R R

R Rsim a b r

R R e

13

Service Classifier “Passive” feature -- collected by passively

monitoring the DNS queries Ψ1-Number of resolved IPs Ψ2-Number of domains Ψ3-Avg. TTL per domain Ψ4-Network prefix diversity

the ratio between the number of distinct /16 network prefixes and the total number of IPs

Ψ5-Number of domains per network how many domains can be associated to the IPs in a cluster,

throughout different epochs Ψ6-IP Growth Ratio

( )

( )

1 | |

| |i

d

dd ci

R

C Q

14

Service Classifier “Active” feature -- need some additional external

information to be computed Ψ7-Autonomous System (AS) diversity Ψ8-BGP prefix diversity Ψ9-Organization diversity Ψ10-Country Code diversity Ψ11-Dynamic IP ratio

a reverse (type PTR) DNS lookup for each IP,“dhcp”, “dsl”, “dial-up”, etc.,

Ψ12-Average Uptime Index actively probing each IP in a cluster about six times a day

for a predefined number of days

C4.5 decision-tree classifier

15

Collecting RDNS Traffic 2009/3/1 ~2009/4/14 two traffic sensors in front of two different RDNS servers

of ISP more than 4 million users about 1.3 billion DNS queries of type A and CNAME per

sensor over 2.5 billion DNS queries per day related to hundreds

of millions of distinct domain names

16

Evaluation of the Service Classifier we manually inspected and labeled a fairly

large number of clusters of domains AUC DR FP

All Features 0.992 (0.003) 99.7% (0.36) 0.3% (0.36)

Passive Features 0.993 (0.005) 99.4% (0.53) 0.6% (0.53)

Ψ6, Ψ3, Ψ5 0.989 (0.006) 99.3% (0.49) 0.7% (0.49)

Table I: Classification performance computed using 5-fold cross-validation. AUC=Area Under the ROC Curve; DR=Detection Rate; FP=False Positive Rate. The numbers between parenthesis represent the standard deviation of each measure.

17

Can this Contribute to Spam Filtering?

Intuition if the domain name of the website points to one or

more previously detected flux agents, it is very likely that the website is malicious

18

Detection rate: 90% to 95% that several of the domain names detected as malicious did not

appear to have a “fluxing” behavior themselves, but resolved to a flxed set of IP that partially intersected with the IP of flux agents

19

Conclusions

passive approach for detecting malicious flux service networks in-the-wild Not limited to the analysis of suspicious domain

names extracted from spam emails or precompiled domain blacklists

Our passive detection and tracking of malicious flux service networks may benefit spam filtering applications

Documents

1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29