Upload
conrad-morgan
View
215
Download
1
Embed Size (px)
Citation preview
JOHN P. JOHNFANG YU
YINGLIAN XIEMARTÍN ABADI
ARVIND KRISHNAMURTHY
PRESENTATION BY SAM KLOCK
Searching the Searchers with SearchAudit
Motivation (cont’d)
Search engines open opportunities for attackers Construct clever queries Find vulnerable sites Plant malware; spam (e.g., MyDoom) Do so stealthily and cheaply
Mitigation strategy: identify malicious queries May be able to deny results to user Identify attackers (probably bots) Interpret strategy, then anticipate and prevent
The question: how to do so
Proposed Approach
SearchAudit Framework for
generating malicious queries
Input: Seed set of known
malicious queries Search logs
Output: Large set of suspicious
queries Regular expressions
matching queries
inurl:gotoURL.asp?url=filetype:asp inurl:"shopdisplayprod ucts.asp"ext:pl inurl:cgi intitle:"FormMail *" -"*Referrer" -"* Denied" -sourceforge -error -cvs -inputfiletype:cgi inurl:tseekdir.cgi
...
SearchAudit
inurl:gotoURL.asp?url=filetype:asp inurl:"shopdisplayprod ucts.asp"ext:pl inurl:cgi intitle:"FormMail *" -"*Referrer" -"* Denied" -sourceforge -error -cvs -inputfiletype:cgi inurl:tseekdir.cgi
...
inurl:gotoURL.asp?url=filetype:asp inurl:"shopdisplayprod ucts.asp"ext:pl inurl:cgi intitle:"FormMail *" -"*Referrer" -"* Denied" -sourceforge -error -cvs -inputfiletype:cgi inurl:tseekdir.cgi
...
inurl:gotoURL.asp?url=filetype:asp inurl:"shopdisplayprod ucts.asp"ext:pl inurl:cgi intitle:"FormMail *" -"*Referrer" -"* Denied" -sourceforge -error -cvs -inputfiletype:cgi inurl:tseekdir.cgi
...
"/includes/joomla\.php " site:\.[a-zA- Z]{2,3}"/includes/class_item\ .php" site:[^?=#+@;&:]{2, 4}"php-nuke" site:[^?=#+@;&:]{2, 4}"modules\.php\?op=modl oad" site:\.[a-zA- Z0-9]{2,6}
Seed set Search logs
Expanded set Regular expressions
Proposed Approach (cont’d)
Needed to implement: Seed set: milw0rm.com Search logs: Microsoft Research Bing Way to expand seed set into more queries Way to infer regular expressions
Intended benefits: Harvesting lots of information
Three months: ~1.2 TB of logs Interpret relationship between queries and attacks Use queries to find potential victims Stop attacks
Query Identification: Expansion
Basic idea: bootstrap on seed set Search logs for exact
matches to seed queries
Record IPs of hosts making seed queries
Add other queries from those IPs to set Intuition: make one
malicious query, will probably make more
Account for DHCP
Seed queries
IP addresses
Queries madeby IPs
Log search
Queries made on same day
Query Identification: Regular Expressions
Goals: Account for variation in
queries Take advantage of scripting
See paper for generation algorithm
Compute score for generated expressions Lower score: more specific Goal: discard overly general
expressions (score > 0.6)Consolidate to avoid
overlapAvoid proxies, public NAT
for performanceLoopback for more queries
Query Identification: Results
Data from Bing and milw0rm 500 queries Logs for Feb. 2009, Dec. 2009, Jan. 2010
~2 billion views per month
System implemented on Dryad/DryadLINQInitial observations:
Using specificity scores < 0.6 seems to be effective Based on cookie heuristic
Proxy elimination does not limitresults
Query Identification: Results (cont’d)
Query expansion: 122 of 500 queries
matched in logs: 174 unique IPs
Expanded to 800 unique queries, 264 IPs
Regular expressions matched 3,560 queries, 1,001 IPs
Incomplete seeds Tried with subsets of
original set Coverage still good
Query Identification: Results (cont’d)
Loopback: Multiple loopbacks got
more results One iteration is good
enoughOverall statistics
10,000s IPs each month
100,000s unique queries each month Dec. 09: set of unusual
attacker IPs cause spike
Query Identification: Verification
Want to show queries are malicious Sometimes easy: 73%
of queries associated with security/hacker sites
What about others?No ground truth
existsSo: look for bot-like
features Individual level (one
IP) Group level (multiple
IPs)
Individual bots New cookie Whether a link was
clickedGroups of bots
Data often fixed by botnets User agent string Metadata for requests
Tendencies dictated by scripts Pages viewed per
query Time between queries
Query Identification: Verification (cont’d)
Substantial variation between host behavior for normal queries and suspicious queries
Observations on Stage One
Regular expressions can become obsolete Just need fresh logs and a new seed to get new ones
Attacker awareness of technique yields adaptation Example: mix in normal user queries
Goal: trick SearchAudit into identifying as proxy Hard to do: needs to be appropriate to time and place Anyway: proxy elimination is optimization only
Injecting randomness also possible, but makes querying less productive
Could obviate cookie heuristic, but it is replaceableAll attackers need to be careful to succeed
Query Analysis
42,000 IPs gave suspicious queries globally U.S., Russia, China contribute almost 50% 10% of IPs gave 90% of queries
Found 200 regular expressionsReveal three kinds of attack-related queries:
Vulnerable web sites Forum spamming Phishing on Windows Live Messenger
Queries for Vulnerable Websites
Queries look for exploitable server vulnerabilities GET variables embedded in
URL (for SQL injection) Server software with known
vulnerabilities (e.g., status pages)
SearchAudit as a defense: Pull suspicious queries for
vulnerabilities Run queries; gather results Inspect results for
vulnerabilities Notify sites of vulnerabilities
inurl:index.php?content=X
http://www.example.com/index.php?content=X’%20OR%20’1’%20OR%20‘1=1’
Queries for Vulnerable Websites (cont’d)
With identified queries: Sampled 5,000 queries Obtained 80,490 URLs from
39,475 sitesCompared to
malware/phishing lists: 3-4% on anti-phishing lists 1.5% on anti-malware lists
SQL injection vulnerability: Add a single-quote to
variable in URL Look for SQL error 12% of examined URLs
showed an error
Queries for Forum Spamming
Query motivation: Find scriptable forums Good for spam, PageRank
Found 46 applicable regular expressions
Most IPs show transient behavior: probably bots All regular expression
groups show at least one group similarity feature
IPs got less aggressive over time: more stealthy
Queries for Forum Spamming (cont’d)
Validation Project Honey Pot
Dynamically generate e-mail address for each visiting IP
E-mail received: must be spam
12% of all IPs listed (vs. 0.5% for normal IPs)
Applications Use queries to find and
clean targeted pages Deny results to
malicious queries
Phishing via Windows Live Messenger
Queries triggered by normal users Victim receives
message from a contact Follow link for party
photos Taken to fake WLM
login After giving
credentials, redirected to Bing search for “party”
Bing search to avoid costs of hosting
Phishing via WLM (cont’d)
Detect via query referral field (source page) Found two regular
expressions for referrals Both expressions: victim
username embedded in URL
Over 180 phishing domains for 12 IPs detected
Compromised accounts show different login behaviors
Conclusion
Presented framework for finding suspicious queries Input: search logs, small set of seed queries Output: regular expressions, millions of suspicious
queriesAnalyzed suspicious queries
Identified possible attacks Suggested means of prevention
Generally: attempted to demonstrate relationship between suspicious queries and the possibility of attack