Exploiting Temporal Persistence to Detect Covert Botnet Channels

Exploiting Temporal Persistence to Detect Covert Botnet Channels

Frederic Giroire (CNRS, France), Jaideep Chandrashekar, Nina Taft, Eve Schooler, and Dina Papagiannaki (Intel Research)

RAID’09

2009/9/4 Speaker: Li-Ming Chen 2

Outline

Introduction Temporal Persistence Design and Implementation Dataset and Evaluation Conclusion and Comments


Botnet

Botnet A botnet is a collection of compromised end-hosts Under controlled by a bot-master Through a command and control (C&C) channel Used to launch various malevolent activities

DDoS, spamming, stealing privacy, etc.

Why botnets are so common and dangerous? Low maintenance cost and easy of use (e.g., through IRC) Non-tech criminals can buy or rent botnets Botnet-based underground economy


Botnet Detection

Traditional intrusion detection: Misused detection

Drawback: only for known attacks, and easy to evade

Anomaly detection Can detect activated zombie hosts But with a delay after a host joining a botnet to the time that is instru

cted to carry out a malicious task

Directions for mitigating botnet problems 1.) prevent the recruitment 2.) detect the covert C&C channel (focus)

3.) detect attacks being carried out by the bots


Botnet Detection (related work)

Anomaly-based IRC channels detection (based on protocol/payload analysis)

BotHunter – chains together various alarms to detect a whole (or partial) botnet lifecycle (USENIX Sec.‘07)

BotSniffer – focus on detecting C&C server (NDSS’08)

Similar behaviors to the same destination (centralized botnet)

BotMiner – cluster attack traffic and normal (C&C) traffic, then perform cross clustering to identify hosts that undertake both kinds of communication (USENIX Sec.’08)


Objective of this Paper

Aim to detect botnet C&C communications on an endhost Define “destination atoms” Measure the temporal regularity (persistence) for indi

vidual destination atoms on each endhost Identify suspicious C&C communications

Comparing to other detection techniques: Not attempt to identify attack traffic in the traffic stream Not attempt to correlate activities across hosts


Outline



Observations

C&C traffic: Each bot needs to communicate regularly with a C&C serve

r And this is a common behavior across different bots

This C&C communication might be very stealthy Avoid being detected

However without “frequent” communication to a C&C server, the bot becomes invisible to the bot-master Still need to maintain this communication over time

C&C communication may be low frequent but persistent


Observations (cont’d)

Normal communications An endhost, on any particular day, may communicate

with a large set of destination end-points However, most of these destinations are transient

Be communicated with a few times and never again Smaller and stable set of destinations will be visited re

gularly Work related sites, news/entertainment websites, sites contact

ed by applications

need to distinguish C&C traffic from these


Approach (how to exploit temporal persistence to detect botnet channel) Introduce a notion called “destination atoms”, and a metric c

alled “persistence” to capture the lightweight yet regular communication

Training: Persistent destination atoms are added to a host’s whitelist during

a training period The whitelist requires infrequent updating (due to the persistence)

Test: Track the persistence of new destination atoms not already whitel

isted identify the C&C traffic and destination For stealthy attacks:

Track persistence at multiple timescales concurrently


Outline



Destination Atoms

Destination atoms is an aggregation of destinations Only care about the network service being connected

to, not so much the actual destination IP address E.g., the particular addresses that respond to google.c

om vary by location and time (but the user just want to access the google service)

Mapping: Given (dstIP, dstPort, proto) obtain the atom (dstSe

rvice, dstPort, proto)


Example of Destination Atoms

Destination Atoms contacted by somehost.intel.com


How to extract Services? (by heuristic) 1. if the src. and dst. belong to different domains,

the service name is simply the 2nd level domain name of the dst. E.g., google.com, yahoo.com

2. if the src. and dst. belong to the same domain, the service name is the 3rd level domain name E.g., mail.intel.com, print.intel.com


How to extract Services? (by heuristic) (cont’d) 3. utilize application level information (when higher l

evel application semantics are available) E.g., dst. atom for FTP service: (ftp.service.com, 21:>1024,

tcp)

4. using destination port to distinguish services on a single destination host (who provides a number of distinct services)

5. when the addresses cannot be mapped to names, using IP address as the service name


Persistence

Host A - - - ->generates outgoing traffic

(sliding) Observation Window W

Measurement Window s

…

W ≡ [s1, s2, …, sn ] The persistence of a destination atom d in the observation window

W is defined as:

Say d is persistent if

Indication function, return 1 if si > 0, 0 if si = 0

p* is a pre-defined threshold


Persistence in Multiple Timescales Botnets differ from one to another, and we cannot know

a prior the frequency of C&C comm. need to design a method that can track persistence o

ver several observation windows simultaneously

Timescale: Select k overlapping timescales

And the judge of the persistence is

),(...),(),( 2211 kk sWsWsW Smallest timescale Largest timescale

For each timescale,Persistence P(j)(d)


Multiple Timescales - Implementation Size of the measurement window

{s1, s2, …, s7 } = {1, 4, 8, 12, 16, 20, 24} (hour) In preliminary analysis, 87% of connections to the same destination a

tom are separated by at least 1hr

Choose n = 10 Wj = n * sj

(Wmin =10, smin =1)smallest ~ (Wmax =240, smax =24)largest

Implementation k separate bitmaps !?(not necessary)

Smallest timescale(bitmap)

sj is covered by a slot in the next higher timescale

OR operation

therefore, only need to construct a simple long bitmap that cover all the timescales


Compute Persistence

bitmaps stored in DCT, indexed by individual atoms d(for each atom d)

multiple timescales

bitmap length, idx for each bit(ring buffer)

for each smin, compute persistence for all dst. atoms

(there is a separate process that processes each outgoing connection; this check if the destination atom is whitelisted)


Whitelist – Training and Detection Training and detection stages proceed identically (almost)

Persistence of destinations is tracked and alarms raised when this crosses a specified threshold

Training: An alarm simply results in the atom being insert into the whitelist

Detection: Checking whitelist

Alarm is exposed to the enduser for further analysis (benign, insert into whitelist; or malicious, block connecitons)


Outline



End Host Traffic Traces

Collect at 157 hosts over a 5 week period (2007/1~2)

Collect all packets headers Divide data into training and testing sets Training set is used to determine the threshold and build th

e per-user whitelists Testing data is used to assess the detection performance

FP rate and FN rate


Botnet Traffic Traces

Collect botnet binaries, execute on WinXP SP2 VM, and generate botnet traffics No other IP traffic will be sent out of the VM Hard work: binary crash, C&C server not found, only 12 bin

aries work! In test dataset, overlay these botnet traffic on top of the nor

mal traffic traces (conn./min.)


System Properties

For system to work well, whitelists properties: Should be stable, changes infrequently Smaller is better, can speed up the searching

CDF of p(d) across all the atoms seen in training data

(a user typically has fewpersistence dst. atoms)

select p* = 0.6

Distribution of per host whitelistsizes computed using p* = 0.6

Size is small,manageable

(Total 157 users)


C&C DetectionOverlaid bot trace data on top each of the 157 user tracesVarious properties of the detected botnets

A botnet might use multiple timescales for different dst. atoms!

(> 0.6)

s = (1, 4, 8, 16, 20, 24)W = 10 * s

Stealthy botnets

Also detect non-centralized(p2p) botnet


C&C Detection (cont’d)

Using ROC curve to compute the FP and detection rate In an enterprise network, FP rate might be low (well beh

aved users); however, in real world, FP rate will raise! Whitelist applications. e.g., BitTorrent.

ROC curve FP across users (p* = 0.6)

Knee, best threshold

(Total 157 users)

Small userssee large alarms

(avg. 5.3 benign dst. atoms per user)


Detecting Botnet Attack Traffic Study how whitelist can boost the detection rates of more traditional

volume-based anomaly detectors Whitelists known good destinations Traffic going to these destinations must be “anomaly free” (can be filtere

d out) Use a simple connection count detector with a 99.9%-ile threshold

After filtering, the detection rate is better (e.g., Aimbot-25, VB-666) The benefit of filtering is apparent when the botnet traffic volumes are lo

w to moderate


Outline



Conclusions

Introduce “persistence” as a temporal measure of regularity in connection to “destination atoms” Persistence does not require any protocol semantics or to look in

side payloads to detect the malware

Describe a method that builds whitelists of known good destination atoms In order to isolate persistent destinations (likely C&C channels)

Evaluation shows that the proposed method successfully identified C&C destinations in every botnet instance

The proposed method can also boost the traditional detection algorithm by filtering traffic


My Comments

Using multi-resolution approach to explore the temporal behavior of a bot Connects to C&C server(s) periodically

Can cooperate with other botnet detection techniques (not host-based)

In detection, alarm raise does not imply finding the attack Require to further analyze the destination and the traffic

The limitation of using multi-resolution approach?

Documents

Exploiting Temporal Persistence to Detect Covert Botnet Channels