Upload
alexandra-johnson
View
97
Download
0
Embed Size (px)
Citation preview
Alex Johnson
alex (at) white.net@alex_cestrian
USING SERVER LOGS TO YOUR ADVANTAGE
@alex_cestrian #OptimiseOxford#OptimiseOxford
What are server logs?
@alex_cestrian #OptimiseOxford#OptimiseOxford
A server log is a simple text file which records activity on a server.
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
So why bother looking at server logs?
@alex_cestrian #OptimiseOxford#OptimiseOxford
There is only one resource that tells you what search engines are looking for on a domain…
These are web server logs.
including stuff they found 13 years ago.
@alex_cestrian #OptimiseOxford#OptimiseOxford
How do we analyse all that data?
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
2 SCENARIOS
@alex_cestrian #OptimiseOxford#OptimiseOxford
Scenario 1 IDENTIFY ORPHAN PAGES
@alex_cestrian #OptimiseOxford#OptimiseOxford
An orphan is a page that is not linked to by another page on the site.
Homepage
Dresses Skirts Our offers
Summer 2016 offers
@alex_cestrian #OptimiseOxford#OptimiseOxford
Summer 2016 Offers
@alex_cestrian #OptimiseOxford
Why are orphan pages bad?
• There may be a lot of them, and they may be competing with your ‘live’ content
• They waste GoogleBot’s crawl budget for your domain
@alex_cestrian #OptimiseOxford#OptimiseOxford
So how do we find orphan pages using log files?
@alex_cestrian #OptimiseOxford
Upload a crawl of your website (from SF, DeepCrawl etc)
URLs that return a 200 status code✅ …that don’t appear in the crawl of your site
@alex_cestrian #OptimiseOxford
Redundant content, off little value
404/410 status code
Relevant, valuable but out-of-date
301 redirect to relevant live page
Useful content that orphaned accidentally
Re-attach the page to the website
@alex_cestrian #OptimiseOxford
If GoogleBot is wasting lots of time in a specific folder full of orphan pages that hold no value, block it via robots.txt
@alex_cestrian #OptimiseOxford#OptimiseOxford
Scenario 2IMPROVING CRAWL EFFICIENCY
@alex_cestrian #OptimiseOxford#OptimiseOxford
Find where GoogleBot is wasting time
Find parameter driven pages
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford
Block GoogleBot from crawling these URLs
@alex_cestrian #OptimiseOxford#OptimiseOxford
Find infrequently visited pages Order by number of events: low to high
@alex_cestrian #OptimiseOxford#OptimiseOxford
• Is this URL in the xml sitemap?
• Is the page too deep within the architecture?
• Is internal linking to this page optimal?
• Are links to this page travelling through multiple redirects?
• Can GoogleBot actually parse the links pointing to this page?
@alex_cestrian #OptimiseOxford#OptimiseOxford
Look at all urls, and filter by average response time
Find slow loading pages
@alex_cestrian #OptimiseOxford#OptimiseOxford
If time taken is consistently high, you need to look at how you can reduce the load of the page
@alex_cestrian #OptimiseOxford#OptimiseOxford
“See what GoogleBot is actually consuming. Improve GoogleBot’s diet.”Oliver Mason at Brighton SEO 2016
THANK YOU
@alex_cestrianALEX JOHNSON
THANK YOU
ALEX