A Crawler-based Study of Spyware on the Web
Author: Alexander Moshchuk, Tanya Bragin, Steven D.Gribble, Henry M.Levy
Presented At: NDSS, 2006
Prepared By: Amit Shrivastava
Introduction
Spyware study Infected 80% of AOL users 93 spyware components (known)
Goals Locate spyware on the internet Gather Internet spyware statistics Quantitative analysis of spyware-laden content on
the web
Introduction cont.
What is spyware? Crawling the web
Web executables Drive-by downloads
Results Improvements
Definition
Spyware – software that collects personal information about users No user knowledge
Spyware techniques: Log keystrokes Collect web history Scan documents on hard disk
Types of Spyware
Spyware-infected executables Content-type header URL extension
Drive-by downloads Malicious web content Produce event triggers
Executable files
Finding executables Content-type (HTTP header) contains .exe URL contains .exe, .cab, or .msi
Hidden executables Embedded file (.zip) URL hidden in JavaScript
Missed executables Hidden URL on dynamic page
Executable files
DL, install, run in a clean VM Tool to automate installer framework
EULA agreements Radio buttons and check boxes
Analyze file Ad-Aware software Log identifies spyware program
Web Crawling
Heritrix public domain Web crawler Search 2,500+ web sites
Different categories 1) Celebrity sites
2) Games sites
3) Music sites
4) Adult sites
5) Online news sites
6) Wallpaper sites
7) Pirate sites
Changing Spyware Environment
2 separate program crawls May, October 2005
Most recent anti-spyware program used October crawl detect mores vulnerabilities
Executable Results
2 separate program crawls May 2005 – 18 million URLs Oct 2005 – 22 million URLs
No appreciable change in spyware
Spyware Functions
Spyware-infected executablesContain various spyware functionsExecutables may have multiple functions
Spyware Upgrades
Spyware-infected executables May have multiple spyware functions
1,294 infected .exe found in Oct 2005 880 detected 414 new one
Blacklisting Spyware
Block clients from accessing listed sitesDone by firewall or proxyBlacklisting is ineffective
Drive-by Downloads
Spyware from visiting a web pageJavascript embedded in HTML
Modifies system filesModifies registry
entries.
Event Triggers
Event occurs that matches a trigger Trigger Conditions
Process creation File activity (creation) Suspicious process (file modification) Registry file modified Browser/OS crash
Drive-by Results
3 web crawls May 2005 – 45,000 URLs Oct 2005 – Same URLs Oct 2005 – New URLs
Decrease in infectious URLs
Increase in unique spyware programs
Origin of Drive-by DLs
Top 6 web categories (IE): Pirate sites Celebrity Music Adult Games Wallpaper
Spyware Top 10
Top 6 web categories (IE): Pirate sites Celebrity Music Adult Games Wallpaper
MAY 2005 OCTOBER 2005
Spyware Trends
Decline in total # of spyware programs Increase of anti-spyware tools Automated patch installations Lawsuits against spyware distributors
Strengths
Analysis method Studies density of spyware on the Web Produces spyware trends over time
Calculated frequency of spyware on web Distinguished security prompts (y/n)
Found 14% of spyware is malicious Density of spyware is substantial
Weaknesses
URL hidden in JavaScript, dynamic page
Limited by what Ad-Aware is able to detect
Different anti-spyware programs (May/Oct)
Did not crawl entire web