SNARE: Spatio-temporal Network- level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray, Sven Krasser Klevis Luli
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
Detecting Spammers with SNARE: Spatio-temporal Network-level
Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick
Feamster, Alexander G. Gray, Sven Krasser Klevis Luli
Slide 2
SNARE Overview Sender reputation system that automatically
classifies email senders based on various network-level features. o
No content checking, lightweight o Not blacklisting Features that
help distinguish spammers from legitimate senders Automated
Reputation Engine Implementation Evasion and Limitations Evaluation
Future Work
Slide 3
Single-packet features No previous history from the IP address,
only a single packet from the IP address in question Receiver does
not need to accept connection request o geographic distance: spam
tends to travel longer geographic distances between sender and
receiver o sender neighborhood density: a cluster of senders in a
small address space could be a botnet o probability ratio of spam
to ham (genuine email) at the time of day the IP packet arrives:
legitimate email follows a certain trend o AS number of sender:
more reliable than the IP address, a large amount of spam comes
from a small amount of ASes o open ports on sender: legitimate mail
senders usually provide certain services so they listen on more
than one port.
Slide 4
Single-packet features
Slide 5
Single-header and single-message features Collected after
looking at SMTP headers or messages Receiver accepts connection
Provide increased confidence o Number of recipients in To field:
Spam usually has more recipients than ham o Length of message: Spam
tends to be short and less random Constructed if some history from
an IP is available By summarizing behavior over multiple messages
and over time, these aggregate features may yield a more reliable
prediction. o geodesic distance between the sender and recipient, o
number of recipients in the To field of the SMTP header o message
body length in bytes Comes at the cost of increased latency because
messages need to be collected first Aggregate features
Slide 6
Automated reputation engine RuleFit supervised learning
algorithm x for input variables, f(x) for base learner functions
Rules in a decision tree used as base learners Automatically
classifies email after being trained Can evaluate relative
importance of features Input variables that frequently appear in
important rules or basic functions are deemed more relevant.
Slide 7
Implementation Other scenarios: o A standalone DNS-based
Blacklist o A first-pass filter before existing mechanisms
Slide 8
Evasion and Limitations AS numbers: Robust to indicate
malicious hosts, not easy for spammers to move mail servers or the
bot armies to a different AS Message length: Knowing that SNARE
checks the length of message, a spammer might start to randomize
the lengths of his emails. Nearest neighbor: Hard to modify.
However, the botnet controller could direct bots on the same subnet
to target different sets of destinations. Open ports: Legitimate
hosts could be blocking port scans. Geodesic distance: Spammer
could modify bots to send to closer recipients. Number of
recipient: Spammer could send to individual hosts one by one Time
of day: Botnets could send email during legitimate peak hours to
look legitimate. Authors main argument: Above changes are difficult
or would limit flexibility and efficency of botnet. Other
Limitations: Scaling, Web-based email accounts
Slide 9
Evaluation 14 days of data, October 22, 2007 to November 4,
2007 Data trace is divided into two parts: o The first half is used
for measurement study o The other half is used to evaluate SNAREs
performance RuleFit trained with 1 million randomly sampled
messages from each day with (5% to 95% spam to ham ratio)
Slide 10
Evaluation
Slide 11
Future Work Incorporating temporal features into the
classification engine Making SNARE more evasion-resistant Refining
the whitelist