Google Confidential and Proprietary
Making the most of your contentGuidelines, Tools, Advice
Chewy TrewhellaDeveloper Advocate
Google Confidential and Proprietary
Agenda
• Understanding Search
• How Search Works
• Webmaster Guidelines
• Hot Topics
• Site Owner Resources
Google Confidential and Proprietary
Understanding searchGoogle and Search Results
Google Confidential and Proprietary
Overview
Google's mission is to organize the world's information and make it universally accessible and useful.
Google Confidential and Proprietary
Online trends
Every day 100 million videos are served onlineSource: Comscore
Around 60 billion email messages are sent dailySource: Deutsche Telekom
Every month hundreds of millions of people search on Google
Source: Comscore
10-25% of the web is new every time Google indexes it!Source: Google
Google Confidential and Proprietary
How Search WorksFundamentals of Crawling, Indexing, and Ranking
Google Confidential and Proprietary
Life of a Query
Before the Search…
www
Buildindex
Crawl web Calculate PageRank
Google Confidential and Proprietary
Crawling the Web
We’re downloading a copy of the web
How we’ve become more intelligent about it:
• Understanding non-HTML content
• Working with partners to get “deep web” content
• Letting webmasters control crawl speed
We crawl continuously, scheduling visits to each page intelligently to maximize freshness
www
Google Confidential and Proprietary
Calculating PageRank™
PageRank (PR) is a measure of the “importance” of a page based incoming links from other pages. It is one factor we use to rank results.
Each link from A to B adds some amount PR to B based on the PR and outbound links of A
PR is calculated for billions of pages and recorded for use in ranking
Both links count…But Link #1 counts more.
Bbc.co.uk(PR = 9)
myblog
Link #1
Link #2
your site
Google Confidential and Proprietary
Building the Index
This is like the index of a book; a mapping of words to the pages on which they appear. The Web is our book.
We keep a posting list of all the words we see and for each word on each page, record to the list where it occurs.
We then break up the index into shards and distribute them to many computers
When a user enters a query such as “Frans Bauer” each computer searches a small piece of the index for matching pages
Google Confidential and Proprietary
Life of a Query
During the Search…
www
Scanindex
Submit query Route Fan out
Select documentsRank resultsPresent results
www
Google Confidential and Proprietary
Ranking: How we do it
We order pages based on relevance and importance• 200+ quality signals, many ranking change proposals every month
Importance (query-independent)• The popularity and authoritativeness of a page, calculated when index was built. On Google, this
factor is known as PageRank
Relevance (to query)• How well the content of a specific page (not site) matches the user’s search query, taking into
account signals like # times the word appears, where it appears, anchor text of linking pages, etc.
Ranking’s goal is to list the most useful documents from the selected set in order.
Google Confidential and Proprietary
Goals of ranking
We create general methods to improve our ranking that are scalable, impartial and provide benefit to users
Hard problem: 25% of queries have not been seen in 3 months
• Hard problem 2: 10-25% of the web is new each time we crawl it
Considerations
• the query (intent, query variations, language)
• the user (web history, location, task)
• the content (page rank, reputation, quality, language)
• the web (changing content, current events)
Google Confidential and Proprietary
Webmaster GuidelinesAnd Basic Site Preparation
Google Confidential and Proprietary
Basic site preparation
Discoverable Can your site’s pages be found by Google?
Indexable Can your site’s URLs be indexed?
Are they unique?
Content Is the content useful? Will user searches match?
Is the structure and content of the page clear to Google?
Rank How well do your site’s pages rank?
Google Confidential and Proprietary
Webmaster Guidelines
Google Confidential and Proprietary
Webmaster Guidelines: Site Structure
• Parameters It helps to keep the parameters short and the number of them few.
• Good: http://www.google.com/search?q=amsterdam
• Bad: http://maps.google.com/maps?f=q&hl=en&geocode=&q=amsterdam&ie=UTF8&z=11&iwloc=addr
Google Confidential and Proprietary
Webmaster Guidelines: Site Structure
• Directory structure Make a site with a clear hierarchy and text links.
• Good:
• Bad:
Google Confidential and Proprietary
Webmaster Guidelines: Site Structure
• Link structure: Every page should be reachable from at least one static text link.
Google Confidential and Proprietary
Webmaster Guidelines: Site Structure
• Redirects: Google recommends that you use fewer than five redirects for each request./
/index.asp
/index.asp?jsessionid=weiru4895u89ur8932
Google Confidential and Proprietary
Webmaster Guidelines: Title and Snippets
• User Queries: Think about how users actually search – not just what the brand manager says.
Google Confidential and Proprietary
Webmaster Guidelines: Title and Snippets
• Title: Make sure that your TITLE tags are descriptive and accurate.
• Heading: does this follow the title, and continue the theme?
• Keywords: are they relevant terms?
Google Confidential and Proprietary
Webmaster Guidelines: Title and Snippets
• Snippets: Different sources are used, including META tag for each page. Make sure they are descriptive of that page.
Google Confidential and Proprietary
Webmaster Guidelines: Body Text
• Check your site: Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would.
Google Confidential and Proprietary
Webmaster Guidelines: Body Text
• Flash, JavaScript, etc: If fancy features keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.
• Like YouTube, use HTML for the majority of each page, and use Flash or Javascript sparingly to provide rich content.
Google Confidential and Proprietary
Webmaster Guidelines: Body Text
• Images: Make sure that your ALT attributes are descriptive and accurate.
• GoogleBot can’t read images, so help it to understand them.
Google Confidential and Proprietary
Webmaster Guidelines: Test and Measure EVERYTHING
• Analytics• Website Optimiser
Google Confidential and Proprietary
Webmaster Guidelines: Test and Measure EVERYTHING
Track all sources of traffic – not just Google Adwords
All search engines (both paid and natural) are supported
As well as referring sites, directories, etc.
Google Confidential and Proprietary
Hot TopicsGeo-Localization, Universal, Flash, Robots, Duplicates, Paid Links
Google Confidential and Proprietary
Different Results for Different Locations
Google Confidential and Proprietary
Different Results for Different Locations
Google Confidential and Proprietary
Universal Search
• News/News Archive results Submit your site for inclusion in Google News/News Archive
• Image results Opt-in to the enhanced image search
• Local results Upload your locations to Local Business Center
• Video results Host your video content on YouTube or submit your feed in Webmaster Tools.
• Blog results Add your blog’s web feed to Google Blog Search
Google Confidential and Proprietary
Searching Flash content
Flash can be great for high quality user experiences
Most search engines don’t index Flash movies
Google has a “first generation” Flash indexing solution
We’re improving Flash indexing
Many online video sites use Flash for rich content, but HTML for descriptive information.
Google Confidential and Proprietary
Flash indexing today - best practices
Advantages:
• A rich user experience
• Flash sites can be highly interactive and “magical”
Disadvantages:
• Search engines struggle to read it
• Many users can’t access it: screen readers, mobile devices, Linux
Best practices:
• Use Flash for content, use HTML for navigation
Google Confidential and Proprietary
Flash best practice recommendations
Use HTML for navigation, Flash for page content• Allows us to see all the pages of your site
Use the description meta tag• Gives us text to index if we can’t access your content
Use text tracks within Flash• Google can extract text tracks, but not text burnt into images
Create an HTML version of your site for non-Flash users:• Google can navigate and index this
• Great for users with page readers etc.
• Avoid cloaking: don’t show different versions based on user-agent
Google Confidential and Proprietary
Flash best practice recommendations
Google Confidential and Proprietary
Flash best practice recommendations
Google Confidential and Proprietary
Robots exclusion protocol (aka robots.txt)
Robots Exclusion Protocol
• Tells search engines what not to index
Created in June 1994. Now de-facto standard
• Ongoing work to improve
Google Confidential and Proprietary
Simple examples
Robots.txt (top-level directory, text file):
user-agent: googlebot
disallow: /logs/
allow: /logs/introduction.html
META tags (HEAD section of HTML):
<meta name=“googlebot” content=“noindex”>
Google Confidential and Proprietary
Policy and per-page
Use robots.txt for:
• general rules about directories to exclude
Use META tags for:
• Per-page control
• Control when you don’t have access to robots.txt
Google Confidential and Proprietary
Sophisticated control
Exclude files by type:
disallow: *.ppt$
Control snippets and cache display:
<meta name=“googlebot” content=“nosnippet, noarchive”>
• Useful for temporary content
• Use with care
Google Confidential and Proprietary
Duplicate Content
Negative effects
• Dilution of link popularity
• Long urls - bad branding and user experience
Google’s solution
• Group urls into clusters
• Select the best url and consolidate url properties to it
Google Confidential and Proprietary
Paid Links Passing PageRank
It is a violation of webmaster guidelines
• Skews organic search results for users
• All major search engines oppose it
• Will impact the site’s reputation with Google
• If you feel your were impacted, fix the violation and submit reconsideration request
It is not a violation when
• Buying or selling links for traffic or branding without passing PageRank
Google Confidential and Proprietary
Site Owner ResourcesWebmaster Central, Webmaster Tools, Sitemaps
Google Confidential and Proprietary
Our goal is high-quality, objective search results
Our goal is to have the most relevant, useful search results on the web.
We strive to provide scalable, equitable support for all webmasters and all sites, large and small.
By some estimates, there are 100 million sites on the web, so we need something really scalable.
Google Confidential and Proprietary
Life of a Query
Before the Search…
www
Buildindex
Crawl web Calculate PageRank
Google Confidential and Proprietary
http://www.google.com/webmasters
Webmaster Central
Google Confidential and Proprietary
Questions Webmaster Central Can Answer
How can I improve my site's visibility in the web index?
How can I tell Google my desired geographies?
Where is my Google traffic coming from?
How do I ensure all my pages are indexed?
How can I change the snippet (or sitelink) under my site?
What’s the best way to redirect traffic?
Google Confidential and Proprietary
Google Webmaster Tools
A free and easy way to improve your site’s visibility in Google search results
Available in 22 languages
Google Confidential and Proprietary
“Dashboard” provides an overview of your account
• See the status of the websites and sitemaps in your account
Your websiteSitemap status
Verified status
Google Confidential and Proprietary
“Site Verification” gives you detailed reports
• Site verification ensures that only the true site owner gets access to detailed site statistics• You can get site statistics before you submit a Sitemap
Verification optionsVerification status
Google Confidential and Proprietary
Diagnostic reports help you troubleshoot crawl errors
• Overview shows a quick snapshot of crawl and indexing status of your site• Alerts webmasters about some violations to the webmaster guidelines
Message center alerts
Index summary
Google Confidential and Proprietary
Crawl errors show you which pages were problematic
• See error types for specific URLs to quickly identify and easily fix issues
Page Type: web, mobile
Crawl error summary
Error detail
Date stampURLs with
errors
Google Confidential and Proprietary
Mobile crawl errors show you which mobile pages had problems
• See error types for specific mobile URLs to quickly identify and easily fix issues
Mobile CHTML crawl errors
Mobile WML/ XHTML crawl errors
Google Confidential and Proprietary
“Top Search Queries” show queries that drive traffic to your site
• See your top 20 search query and search query clicks statistics• Top position shows you where your pages were listed per search query• Timeline shows query stats in the past • Easily export a report with CSV download feature
Search queries = impressions in search results
Position per query in search results
Query Clicks = Traffic
More stats
% out of the top 20 queries
Timeline = Historical views
Google Confidential and Proprietary
Mobile web statistics show traffic from mobile devices
• See top searches from mobile devices and top searches on mobile web.
Searches from mobile phones, PDAs, etc.
Select geographic specific domains
Google Confidential and Proprietary
“What the Googlebot Sees”
• See common phrases & keywords on your site and in links to your site
Words on your site
Links to your site
Page type Encoding
Google Confidential and Proprietary
“Crawl Stats” show your page distribution in Google
• See distribution of crawled pages
PageRank distributionURL with highest PageRank
Google Confidential and Proprietary
“Index Stats” shows how your pages are indexed
• Learn more on how your pages are included in the Google index with advanced search operators
Type of advanced search operator
Google Confidential and Proprietary
Links show which pages are linked outside your domain and how often
• See the pages with the most links pointing from your own site & outside sites• Easily download full site data with CSV download feature
# of links from outside websites
URLs in your site
URLs in your site
# of links from within your site
Google Confidential and Proprietary
Google Confidential and Proprietary
Sitelinks shows generated sitelinks and blocking controls
• Sitelinks are automatically generated listed under search results to help with site navigation• View generated links, block links you do not want visible in search results, and provide feedback on inaccurate sitelinks
Current blocked sitelinks
Automatically generated sitelinks
Provide feedback on sitelinks
Google Confidential and Proprietary
Re-inclusion request, spam report, paid links report forms
• Lets webmasters tell us when they fixed quality violations to help get back in the index faster• Requests from Google webmaster tools are more “trusted” because they are from a registered user• Spam & paid links report to help webmasters be good citizens to report spam results and websites selling/buying links
Google Confidential and Proprietary
Robots.txt analysis helps to improve your coverage
• Confirm your robots.txt URL, status, “last downloaded”, and homepage access• Test against different Googlebots including search, content, mobile, and image
Date stamp and status
Test against different Google crawlers
Google Confidential and Proprietary
Set crawl rate
• View 90 day Googlebot activity and load on your servers• Adjust crawl rate
Choose crawl rate
Kilobytes/day downloaded by Googlebot
Average page download time
# pages crawled - includes URLs that point to same page
Google Confidential and Proprietary
“Geographic target” allows you to associate a site with geographic region
• Submit geographic data for an entire site or site subdirectory
Specify full or partial geographic information
Google Confidential and Proprietary
“Preferred domain” lets you tell us how you want URLs to be displayed
• You can choose www or non-www, or opt not to set an association
Google Confidential and Proprietary
“Enable enhanced image search” to enhance search visibility of images on your site
• You can choose to let Google gather additional metadata about your images using Image Labeler• More metadata = relevant image search results
Google Confidential and Proprietary
“Remove URLs” allows you to remove a URL, subdirectory, or site from the Google index
• Request a removal in 3 steps:
Step 1: Make a New Removal Request
Step 2: Select Removal type (site, sub directory, URL)
Step 3: Submit URL path
Google Confidential and Proprietary
Sitemaps: make sure we know about your site
The problem: islands of links
• Pages that aren’t linked from outside your site
• Search engines can’t find these
The problem: crawling large sites
• Crawl of very large sites is limited
• If we know when pages have changed, we can optimize crawling
The solution: Sitemaps
• The open standard for providing a list of all your pages
• Supported by Google, Yahoo, Microsoft and Ask
Google Confidential and Proprietary
Keep Google informed of all your pages1
2 Increase coverage of your pages in the Google index
WWW
Web crawl
Sitemaps enhances the web crawl
Google.com
Help improve the visibility of your pages on Google
Sitemaps: how they work
Your Site
Google Confidential and Proprietary
Submitting your Sitemap improves the visibility of your URLs
Google Confidential and Proprietary
Submitting your Sitemap improves the visibility of your URLs
Status of your sitemap
Your sitemap
• Tell Google about every page on your site
Google Confidential and Proprietary
Additional types of Sitemaps: Mobile Sitemap
Webmasters can submit Sitemaps of URLsthat serve mobile content into Google’s mobile index
Mobile Web results:'bbc'Results 1 - 10 of about113,000.1 BBC - WAP - BBC NewsBBC Sport Ashes 2005Highlights Entertainmentwww.bbc.co.uk/mobile/
2 bbc.co.uk/mobile - BBCNews BBC Sport Filmsnews.bbc.co.uk/mobile/
• Mobile content isspecifically designedto fit the small screensof mobile phones anddevices
• Supported markuplanguages includeXHTML, WML, andcHTML
Google Confidential and Proprietary
Add a Sitemap in 3 steps
Step 1: Create a Sitemap with the Sitemap Generator
Step 2: Upload the Sitemap file to your website
Step 3: Add the Sitemap URL to your account
For accounts with multiple websites:You can include URLs from verified websites in a single Sitemap
Google Confidential and Proprietary
What have we learnt?
Build content for users, not search engines
Test and measure everything
Sign up for webmaster tools and control how we crawl you site
Describe all the content on your site effectively
Submit a sitemap
Visit http://google.com/webmasters/
Help us to crawl you comprehensively, so your content can be found
Google Confidential and Proprietary
Q & A
Google Confidential and Proprietary
Useful resources
Webmaster Central:
http://www.google.com/webmasters/
Sitemaps:
http://www.google.com/support/webmasters/bin/answer.py?answer=40318
http://www.sitemaps.org
Webmaster guidelines:
http://www.google.com/webmasters/guidelines.html