21
Duplicate Content Filters, Penalties and other Content Minefields 27th March 2012

Duplicate content presentation March 2012

Embed Size (px)

DESCRIPTION

Some valuable insights into why duplicate content on your website is a problem for Google. Work-arounds and suggested solutions are made, but please let us know your thoughts.

Citation preview

Page 1: Duplicate content presentation   March 2012

Duplicate Content Filters, Penalties and other Content Minefields

27th March 2012

Page 2: Duplicate content presentation   March 2012

Search Quality – the Duplicate Content Headache

Google can’t afford a SERPs of;

1)Search engine optimization Search engine optimization (SEO) is the process of improving the

visibility of a website or a web page in search engines........ 2) Search engine optimization

Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........3) Search engine optimization

Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........4) Search engine optimization

Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........

2

Page 3: Duplicate content presentation   March 2012

Resource – the Duplicate Content Headache

Duplicate content has consequences for SE in;

Wastes Crawler resources - finite number of crawlers

Wastes Bandwidth – how often can you crawl 1 trillion documents and keep your index fresh?

Increases Query CPU time – how do you search 1 trillion documents as quickly as possible?

3

Page 4: Duplicate content presentation   March 2012

Document importance – Duplicate Content Headache

Duplicate content can be a signal of an important document;

• Song lyrics

• Scholarly texts and historical documents, eg the Bible (1,000 pages)

• The Linux manual (2,000 pages)

• Breaking News – Associated Press, Reuters

etc.

4

Page 5: Duplicate content presentation   March 2012

Types of Duplicate Content

Duplicate content comes in many forms

Intentional vs non intentional

On-site vs off-site

5

Page 6: Duplicate content presentation   March 2012

On-Site Duplicate Content (Impacts Quality Score)Intentional• Printer friendly pages•Different font sizes•PDF documents•Archive (non graphics versions)•Shopping filters (sort by and pagination)•RSS feeds

Non-intentional• Affiliate URLs - www.example.com/?btag=123• Adwords Campaigns - www.example.com/?utc=google•Search results•www vs non www URLs•https vs http•Stubs/plugins

6

Page 7: Duplicate content presentation   March 2012

On-Site Duplicate Content (Impacts Quality Score)10’000s of stub pages worst case scenario example;

7

This was 2 weeks after Andy had removed the duplicate links from the search pages on our advice eg;http://www.motors.co.uk/Ford-Escort-0-9999999---2http://www.motors.co.uk/Ford-Escort-0-9999999--U-2-http://www.motors.co.uk/Ford-Escort-0-9999999---2%20-

Page 8: Duplicate content presentation   March 2012

Off-Site Duplicate Content (Filters and Penalties)Intentional vs non-intentional somewhat grey

Domain branding eg .com, .co.za(Mobile website)Content syndicationContent theftStaging websites a common problem!!

Quality signals are often used to filter off-site Duplicates!!!

8

Page 9: Duplicate content presentation   March 2012

How Does Google Filter Off-site Duplicate Content

Authors feel they have a right to rank for their own content – Google’s Loyalty is to its users!!!

Google doesn’t necessarily reward a source or original but assesses;

• Relevance (eg is an article in context)• Domain authority & links (eg Google Knol, Facebook)• Fresh content boost

• Site quality signals (eg internal duplicate content!!!)

9

Page 10: Duplicate content presentation   March 2012

Examples of Off-site Duplicate Content and QualityClient with .com.au and a .com with https duplicates

Casino Client with a lot of stub pages (pre Panda)

Casino site – severe health issues;

10

Page 11: Duplicate content presentation   March 2012

How to Diagnose (on-site) Duplicate Content

Link building will exacerbate duplicate content indexing

Keep an eye on indexed pages (weekly) and look for spikes in Google Indexing, (Yahoo and Bing)

Look for site:example.com duplicates

Use Xenu link checker

Heed any Webmaster Tools warnings

Check your crawling and cache dates Frequent update but stale cache dates = dupe content issues

11

Page 12: Duplicate content presentation   March 2012

How to address on-site and off-site duplicate content

You have a whole armoury of potential tools including;

Robots.txt exclusionRobots meta tagCanonical tagWebmaster URL exclusionPassword protection(301 redirects)

(File a DMCA against serial content thieves?)

Lot of well-meaning people give bad advice though

12

Page 13: Duplicate content presentation   March 2012

Google Engineers Can’t Agree

Page 14: Duplicate content presentation   March 2012

Adam Lasnik – “Deftly Dealing with Duplicate Content” 2006

Probably the authoritative guide to duplicate content;

• What is duplicate content?

• What isn't duplicate content?

• Why does Google care about duplicate content?

• What does Google do about it?

• How can Webmasters proactively address duplicate content issues?

`

Page 15: Duplicate content presentation   March 2012

Deftly Dealing with... - Our advice/experience

Robots.txt

Routinely ignored by Google, probably because of malware

User-agent: *

Allow: /the-good-stuff/Disallow: /the-malware/

Robots.txt is ignored unless combined with emergency Webmaster Tools URL removal (3 months)

15

Page 16: Duplicate content presentation   March 2012

Our advice/experience

Canonical tag

Works great for cross-domain duplicate content

Largely ineffective for pagination eg shopping sites

Totally ineffective unless canonical URLs are VERY similar if not identical

16

Page 17: Duplicate content presentation   March 2012

Our advice/experience

Robots Meta Tag

Noindex,Follow - 100% obeyed by Google and passes Page Rank too

Very effective for pagination eg shopping sites

Works well for tracking links too (www.example.com/?affid=123456)

Doesn’t work when used with blocking robots.txt

17

Page 18: Duplicate content presentation   March 2012

Our advice/experience

Password Protect/htaccess 403 Forbidden

Works great for staging sites

Stubs - Problem in that it generates Webmaster Tools errors

Our feeling best to avoid on your main domain

18

Page 19: Duplicate content presentation   March 2012

Extreme Techniques to Avoid Dupe ContentMake all your backend .exe with htaccess

Page 20: Duplicate content presentation   March 2012

Summary

Duplicate content is a minefield!

Filters usually apply, penalties are very rare

You have the answer in your own hands

Stay on top of your site’s health – especially internal duplicate content

Page 21: Duplicate content presentation   March 2012

Thank you for your attention!

Thanks to:Anton GroeneveldtCarla dos Santos