Upload
elfrieda-hope-sims
View
229
Download
0
Tags:
Embed Size (px)
Citation preview
Cloak and Dagger
In a nutshell…• Cloaking
• Cloaking in search engines
• Search engines’ response to cloaking
• Lifetime of cloaked search results
• Cloaked pages in search results
• Ubiquity of advertising on the Internet.
• Search, by and large, enjoys the primacy.
• Search Engine Optimisation – SEO – doctoring of search results.
• For benign ends such as simplifying page content, optimizing load times, etc.
• For malicious purposes such as manipulating page ranking algorithms.
Cloaking• Conceals the true nature of a Web site
• Keyword Stuffing – Associating benign content to keywords
• Attracting traffic to scam pages
• Protecting the Web servers from being exposed
• Not scamming those who arrive at the site via different keywords.
Types of Cloaking
• Repeat Cloaking
• User Agent Cloaking
• Referrer Cloaking (sometimes also called “Click-through Cloaking”)
• IP Cloaking
DAGGER
Dagger encompasses five different functions –
• Collection of search terms
• Querying search results generated search engines
• Crawling search results
• Detecting cloaking
• Repeating the above four processes to study variance in measurements
Collection of Search Terms
Two different kinds of cloaked search terms are targeted:
• TYPE 1 : Search terms which contain popular words.
• Aimed at gathering high volumes of undifferentiated traffic.
• TYPE 2: Search terms which reflect highly targeted traffic
• Here cloaked content matches the cloaked search terms.
• TYPE 1 : Use popular trending search terms
• Google Hot Searches and terms - shed light on search engine based data collection methods, respectively
• Alexa - client-based data collection methods
• Twitter terms clue us on social networking trends.
• Cloaked page entirely unrelated to the trending search terms
• TYPE 2: set of terms catering to a specific domain
• Content of the cloaked pages actually matches the search terms.
Querying Search Results • Terms collected in the previous step are fed to the search
engines
• Study the prevalence of cloaking across engines
• Examine their response to cloaking.
• Top 100 search results and accompanying metadata compiled into list
• “Known good” domains entries eliminated in order to false positives during data processing.
• Similar entries are grouped together with appropriate ‘count’.
Crawling Search Results
• Crawl the URL’s.
• Process the fetched pages
• Detect cloaking in parallel
• Helps minimize any possible time of day effects.
• Multiple crawls
• Normal search user
• Googlebot Web crawler
• A user who does not click through the search result
• Detect pure user-agent cloaking without any checks on the referrer.
• 35% of cloaked search results for a single measurement perform pure user-agent cloaking.
• Pages that employ both user-agent and referrer cloaking are nearly always malicious.
• IP Cloaking - half of current cloaked search results do in fact employ IP cloaking via reverse DNS lookups.
Detecting Cloaking • Process the crawled data using multiple iterative passes
• Various transformations and analyses are applied
• This helps compile the information needed to detect cloaking.
• Each pass uses a comparison based approach:
• Apply same transformations onto the views of the same URL, as seen from the user and the crawler
• Directly compare the result of the transformation using a scoring function
• Thresholding - detect pages that are actively cloaking and annotate them.
• Used for later analysis.
Temporal Re-measurement • To study lifetime of cloaked pages.
• Temporal component in Dagger.
• Fetch search results from search engines
• Crawl and process URLs at later instances of time.
• Measure the rate at which search engines respond to cloaking
• Measure the duration pages are cloaked
Cloaking Over Time
• In trending searches the terms constantly change.
• Cloakers target many more search terms and a broad demographic of potential victims
• Pharmaceutical search terms are static
• Represent product searches in a very specific domain.
• Cloakers have much more time to perform SEO to raise the rank of their cloaked pages.
• This results in more cloaked pages in the top results.
Sources of Search Terms
• Blackhat SEO – artificially boost the rankings of cloaked pages.
• Search detect cloaking either directly (analyzing pages) or indirectly (updating the ranking algorithm).
• Augmenting popular search terms with suggestions.
• Enables targeting the same semantic topic as popular search terms.
• Cloaking in search results highly influenced by the search terms.
Search Engine Response • Search engines try to identify and thwart cloaking.
• Cloaked pages do regularly appear in search results,.
• Many are removed or suppressed by the search engines within hours to a day.
• Cloaked search results rapidly begin to fall out of the top 100 within the first day, with a more gradual drop thereafter.
Cloaking Duration • Cloakers manage their pages similarly independent of
the search engine.
• Pages are cloaked for long durations: over 80% remain cloaked past seven days.
• Cloakers will want to maximize the time that they might benefits of cloaking by attracting customers to scam sites, or victims to malware sites.
• Difficult to recycle a cloaked page to reuse at a later time.
Cloaked Content • Redirection of users through chain of advertising
networks
• About half of the time a cloaked search result leads to some form of abuse.
• long-term SEO campaigns constantly change the search terms they are targeting and the hosts they are using.
Domain Infrastructure • Key resource to effectively deploy cloaking in scam:
• Access to Web sites
• Access to domains
• For TYPE I terms, majority of cloaked search results are in .com.
• For TYPE II terms, cloakers use the “reputation” of pages to boost their ranking in search results
Search Engine Optimization• Since a major motivation for cloaking is to attract user
traffic, we can extrapolate SEO performance based on the search result positions the cloaked pages occupy.
• Cloaking the TYPE I terms target popular terms that are very dynamic, with limited time and heavy competition for performing SEO on those search terms.
• Cloaking TYPE II terms is a highly focused task on a static set of terms,
• Provides much longer time frames for performing SEO on cloaked pages for those terms.
Conclusion• Cloaking has become a standard tool in the
scammer’s toolbox
• Cloaking adds significant complexity for differentiating legitimate Web content from fraudulent pages.
• Majority of cloaked seaarch results remain high in rankings for 12 hours
• The pages themselves can persist far longer.
• Search engine providers will need to further reduce the lifetime of cloaked results to demonetize the underlying scam activity.