Upload
libtutorials101rhc
View
25
Download
0
Embed Size (px)
Citation preview
Spiders and Algorithms
Search engines perform two technical tasks:
Search and Structure
search for new sites and add them to their databases
structure searches for users of the search engine
Searching for New Sites
Search engines search somewhat like ‘spiders’ for new sites to add
They crawl the web finding pages for inclusion by following links from pages already in their database:
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
http://computer.howstuffworks.com/internet/basics/search-engine1.htm
See picture of how it works:
http://computer.howstuffworks.com/internet/basics/search-engine1.htm
Structuring Searches for Users
All search engines use a search algorithm to structure searches for users.
In computer science, a search algorithm is an algorithm for “finding an item with specified properties among a collection of items”. *http://en.wikipedia.org/wiki/Search_algorithm
Search Algorithms are Proprietary
Most search engines keep their search algorithms secret, or proprietary [that means it is a corporate property, and they keep it partially secret.
All search engines feature different [or at least slightly different] search algorithms
All search engines use some form of their own search algorithm
Google Search Algorithm
Let's look at the most well know search engine and how it searches
Google: their search algorithm operates according to a basic principle of relevance ranking
results are ranked according to an algorithm they call PageRank [name is patented by Google]
See link for basic explanation of origins of Google search *http://en.wikipedia.org/wiki/Google
Google’s Search Algorithm
See link for explanation:
From http://en.wikipedia.org/wiki/PageRank
Pretty Mathematical! We won’t go into all that.
Algorithms in Simple Terms
However, the algorithm [as most search engine algorithms] can be broken down into basic concepts of 1) popularity, 2 ) density, and 3) keywords:
site popularity [how many other users search the site] site density [how many other sites link to it] Keywords
Keywords are still key [no pun intended] and how they intersect with the first two
These are considered in:
ranking a site including it in your search results.
Updating Search Algorithms
If that weren’t enough, search engines regularly update their search algorithms
http://www.webmarketingpros.com/blog/how-to-recover-from-the-google-penguin-update/
http://blog.junta42.com/2011/04/4-steps-to-make-googles-panda-update-work-for-you/
Google released two updates in the past few years, termed ‘Panda’ and ‘Penguin’ [similar to updates to PC or MAC operating system, down to the catchy names]
Updating Search Algorithms
These updates were designed to catch and eliminate from searches ‘low quality sites’ [those with little content, ad-heavy or replicating other pages]
http://www.business2community.com/seo/animalistic-algorithms-googles-panda-and-penguin-shakeups-0270910
http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-in.html
Technical Stuff
This is the ‘background’ information on how search engines search
This ‘technical stuff’ is not information we need to activate to use a search engine
[i.e. we don’t need to explicitly construct search algorithms or know about spiders]
Web and Database Searching
Refer to p. 67, textbook, for discussion of controlled vocabulary:
All databases [this includes online catalogues and subscription databases like Ebsco] include controlled vocabulary
Controlled Vocabulary – LC Subject Headings
Databases and online catalogues [Ebsco, our RHC catalogue for books as well as others]
• Use controlled
• vocabulary
• Allow us to
• narrow by
• subjects