3
4 Network Security March 2008 To discuss the nature of a parasitic spam attack, we should firstly understand that it is fundamentally a defacement. Next we should refresh our knowledge of how search engine results are gathered and qualified. Natural search results are worked out by the search engines based on their own particular algorithm, which is unique to each engine. This takes account of the text on the page, relevancy and links to the website. Enough research is completed by search engine optimisation agencies to work out approxi- mately what the ranking factors are for this information to be taken advantage of. “Each link from one site to another is a vote of confidence for the other site, and is used to raise the recipient’s search position” This particular attack takes advantage of the linking between sites. Each link from one site to another is a vote of confidence for the other site, and is used to raise the recipient’s search position. In the search engine optimisation (SEO) industry, this has led to widespread debate on the ethics of buying and sell- ing links. For spammers, it has provided a way of manipulating the search results using the reputation of other sites. This parasitic attack uses the good search position of the compromised site to promote the popularity of sites target- ing competitive search terms, including pills, ringtones or MP3/video downloads. Profile of targeted sites It appears that target sites are identified in different ways. One way is to identify sites that that publicly declare their use of a technology, such as WordPress. This could be achieved with a Google search such as: “Powered by WordPress” -html filetype: php -demo -wordpress.org -bugtraq. The attacker could then review each site for password vulnerabilities and effect a breach. A more wide reaching and economical method is to direct an attack at a hosting environment. Recent high profile attacks on webhosting companies include IPower in the US, Fasthosts in the UK and MD Webhosting in Australia. When a host is attacked as a precursor to a parasitic attack the fallout is that not all of the hosted sites are compromised. A smart attacker would know that this would make the subsequent parasitic attack too easy to detect. Instead, it seems that sites are cherry-picked. This is achieved by cross-checking each domain with its cor- responding Google PageRank. The major- ity of sites discovered with inserted links are PageRank 6 or above. 2 What to look for Once a vulnerable site is identified, the malicious markup is inserted into the site. It is done in a way so that it is invis- ible to users. Unsophisticated examples merely insert links after the </body> tag of the site, so that they are present, but not detectable when the page is ren- dered. Better examples include inserting a hidden <div> into the page, which is more sophisticated in that it allows the attacker to insert the links and the link text to be targeted. This gives maximum benefit to the site being parasitically linked to. Many examples only have the markup code on the home page (it is after all usually the page with the highest PageRank on a site). Typically the <div> is located immediately after the <body> tag, or within the navigation menu code. Again this is a deliberate tactic, because link location on the page is used to indi- cate relevance. Content higher up the page has slightly more importance in ranking a site. However, there is also evidence that in some cases, include files have been targeted, so that the malicious markup gets merged into the template of the site – making every page on the site ‘vote’ for the spam. Discovering these malicious insertions can be achieved fairly easily. There are online tools available to check your out- bound links – normally they are used for detecting dead sites that no longer exist – but they are also useful for checking what sites you are linking out to. This functionality is also available in some Kerry Dye, SEO campaign delivery manager, Vertical Leap In November 2007, 8.1bn web search queries were made in the US. 1 That’s 8.1bn instances of someone trying to find something, which equates to roughly 27 searches per person. To a web marketer that’s 27 opportunities to make a sale to one person. And those searches do not include paid-for adverts or other extra-cost marketing activities. In effect they are free opportunities to sell/mar- ket/propel a product, service or idea. Consequently, this natural or organic traffic is fiercely fought over. When something this much in demand is enabled by or reliant on IP technologies, it is highly likely that it will be maliciously exploited. Website abuse for search engine optimisation SEO ATTACK Kerry Dye

Website abuse for search engine optimisation

Embed Size (px)

Citation preview

Page 1: Website abuse for search engine optimisation

4Network Security March 2008

To discuss the nature of a parasitic spam attack, we should firstly understand that it is fundamentally a defacement. Next we should refresh our knowledge of how search engine results are gathered and qualified. Natural search results are worked out by the search engines based on their own particular algorithm, which is unique to each engine. This takes account of the text on the page, relevancy and links to the website. Enough research is completed by search engine optimisation agencies to work out approxi-mately what the ranking factors are for this information to be taken advantage of.

“Each link from one site to another is a vote of confidence for the other site, and is used to raise the recipient’s search position”

This particular attack takes advantage of the linking between sites. Each link from one site to another is a vote of confidence for the other site, and is used to raise the recipient’s search position. In the search engine optimisation (SEO) industry, this has led to widespread debate on the ethics of buying and sell-ing links. For spammers, it has provided a way of manipulating the search results using the reputation of other sites.

This parasitic attack uses the good search position of the compromised site to promote the popularity of sites target-ing competitive search terms, including pills, ringtones or MP3/video downloads.

Profile of targeted sitesIt appears that target sites are identified in different ways. One way is to identify sites that that publicly declare their use of a technology, such as WordPress. This could be achieved with a Google search such as:

“Powered by WordPress” -html filetype:php -demo -wordpress.org -bugtraq.

The attacker could then review each site for password vulnerabilities and effect a breach.

A more wide reaching and economical method is to direct an attack at a hosting environment. Recent high profile attacks on webhosting companies include IPower in the US, Fasthosts in the UK and MD Webhosting in Australia. When a host is attacked as a precursor to a parasitic attack the fallout is that not all of the hosted sites are compromised. A smart attacker would know that this would make the subsequent parasitic attack too easy to detect. Instead, it seems that sites are cherry-picked. This is achieved by cross-checking each domain with its cor-responding Google PageRank. The major-ity of sites discovered with inserted links are PageRank 6 or above.2

What to look forOnce a vulnerable site is identified, the malicious markup is inserted into the

site. It is done in a way so that it is invis-ible to users. Unsophisticated examples merely insert links after the </body> tag of the site, so that they are present, but not detectable when the page is ren-dered. Better examples include inserting a hidden <div> into the page, which is more sophisticated in that it allows the attacker to insert the links and the link text to be targeted. This gives maximum benefit to the site being parasitically linked to.

Many examples only have the markup code on the home page (it is after all usually the page with the highest PageRank on a site). Typically the <div> is located immediately after the <body> tag, or within the navigation menu code. Again this is a deliberate tactic, because link location on the page is used to indi-cate relevance. Content higher up the page has slightly more importance in ranking a site.

However, there is also evidence that in some cases, include files have been targeted, so that the malicious markup gets merged into the template of the site – making every page on the site ‘vote’ for the spam.

Discovering these malicious insertions can be achieved fairly easily. There are online tools available to check your out-bound links – normally they are used for detecting dead sites that no longer exist – but they are also useful for checking what sites you are linking out to. This functionality is also available in some

Kerry Dye, SEO campaign delivery manager, Vertical Leap

In November 2007, 8.1bn web search queries were made in the US.1 That’s 8.1bn instances of someone trying to find something, which equates to roughly 27 searches per person. To a web marketer that’s 27 opportunities to make a sale to one person. And those searches do not include paid-for adverts or other extra-cost marketing activities. In effect they are free opportunities to sell/mar-ket/propel a product, service or idea. Consequently, this natural or organic traffic is fiercely fought over. When something this much in demand is enabled by or reliant on IP technologies, it is highly likely that it will be maliciously exploited.

Website abuse for search engine optimisation

SEO ATTACK

Kerry Dye

Page 2: Website abuse for search engine optimisation

March 2008 Network Security5

web design software. For Firefox users there is a quick check. Go to the URL you want to query. Once the page has loaded insert the following safe script into the address bar:

javascript:(function(){as=document.getElementsByTagName(%22a%22);str=%22<ul>%22;for(i=0;i<as .length;i++){str+=%22<li><a href=%22+as[i].href+%22>%22+as[i].href+%22</a>\n%22}str+=%22</as></ul>%22;with(window.open()){document.write(str);document.close();}})()

This should list all the links on the page, you can now scour them for misfits.

The implications of an attackWhile this attack is evidence of a breach, it is unlikely to be picked up in a securi-ty review – either an internal assessment or a third-party penetration test. This is because firstly, it is not a particularly well reported issue, and secondly, checking on-page content in the form of links is unlikely to be part of a testing meth-odology. Similarly, it may go unnoticed by the marketing or web management function. Unless you have a very clued-up and dedicated resource for managing your search engine results such as an

optimisation agency, an SEO-savvy web developer, or a competent webmaster, you are unlikely to be made aware of malicious link insertions.

The final fallback mechanism for iden-tifying problems, visitors to your site, can’t help you either. If a site is compro-mised in a way that affects visitors they are usually quite vocal in alerting you to a problem. Unlike even subtle attacks, such as cross-site scripting or cross-site request forgery, or really obvious attacks, such as adding a malware download to a site, there is no penalty, risk or sign-post to your visitors that you have been compromised. The attack targets search engines, and they won’t alert you, they’ll just penalise you by pushing your site further and further down the rankings.

Your three best chances of uncover-ing the compromise are to all intents and purposes useless. Ideally it should be added to a regular website assess-ment checklist, both for your security and web management function. The main impact of the attack is not the compromise itself, but the effect that it has on your site’s search engine results page (SERP) rankings. Clearly it is in the interest of the attacker to hide or cloak the insertion, but the different methods for keeping it under the radar will affect how much damage it will do. By inserting links with hidden <div> the attacker is

replacing an ancient SEO technique called cloaking.

Cloak and taggerHistorically cloaking was used to provide unseen content or links to the search engine but not the user. It enabled a website to game the engines by getting rankings for keywords in the site that were barely relevant to the real content, giving a false indication of its relevance to searchers. Search engines are now largely wise to this. If they discover that a site employs cloaking they could drop it completely from the SERPs, or penalise it by pushing down the rankings.

“Your three best chances of uncovering the compromise are to all intents and purposes useless”

In some cases a corporate website may be strong enough to withstand the con-sequences of the attack without seeing obvious decreases in web performance, or by only losing search engine posi-tions incrementally. In these cases the effects are insidious, and go unnoticed for many months. Ultimately the site may fall completely from the search engines. Even if it were to vanish from Google alone the effect would be dra-matic (Google has approximately 60%

SEO ATTACK

Figure 1: Sequence of events for link attack.

Page 3: Website abuse for search engine optimisation

6Network Security March 2008

share of worldwide searches).3 With so many businesses heavily reliant on internet-generated revenue this type of attack can result in crippling losses.

If you find you have been compro-mised with a parasitic spam attack there are two elements for remediation. The first is to locate the vector for the attack and eliminate it. The second is to repair the site as quickly as possible and restore the image of the site in the eyes of the search engines.

Repairing the damage to your search engine rankings

Once the malicious links are removed from the site, in most cases the site will automatically, but not immediately, return to its previous ranking positions. The results should be tracked to make sure this is the case. Due to differences between the engines, the time may vary to do this – Google results can see a bounceback in a few days, but Yahoo results react more slowly, recovering a few weeks after the markup’s removal.

If the site has been delisted com-pletely, then you will need to submit a re-inclusion request. This is avail-able from all the of the major search engines, and allows you to list your site and give the reason that you were removed and demonstrate that the problem has been corrected. In the case of this sort of attack, there is not normally a problem with getting a re-inclusion request authorised.

Google PageRank as reported by the Google Toolbar may take longer to recover. It appears that this is only refreshed quarterly, meaning that the toolbar PageRank may report incor-rectly long after the issue has been fixed. If this metric is important to your site, for example for selling on-site advertising, then this could signal longer term losses.

The future of spammingThis attack is yet another reminder that unscrupulous and criminal parties stand to profit from unethical and illegal practices. This will always be the case. What is changing is the

evolutionary path of attacks. Currently the perception of spam is that it arrives via email and encourages people to compromise a system, or compromises it automatically. Accordingly, the avail-able defence mechanisms against it cater for SMTP gateways and function via filtering mechanisms. The parasitic spam attack that we have outlined has nothing to do with users, has no relationship with email and has been around for at least two years. There appears to be no mention of it in any OWASP or OSSTMM communica-tions and little recognition in the security testing industry.

References

1. “November 2007 Search Market Share: The Market, Google, and Yahoo! All Break Search Records?”, Jeremy Crane, Compete.com blog, Nov 2007. <http://blog.compete.com/2007/12/12/search-market-share-november-google-yahoo-ask-msn-live/>

2. ‘Our Search: Google Technology”, Google, Feb 2008. <www.google.com/technology/>

3. “Baidu Ranked Third Largest Worldwide Search Property by com-Score in December 2007”, Comscore, Dec 2007. <www.comscore.com/press/release.asp?press=2018>

About the author Kerry Dye is responsible for project management of search engine optimi-sation campaigns for specialist SEO agency Vertical Leap. Working wholly on internet projects since 1996, recently specialising in online marketing, she believes that relevance in the search engine results is important for navigat-ing the thousands of documents on the internet.

SEO ATTACK

Figure 2: Hidden links inserted into a site’s navigation.