17
HOW SEARCH ENGINES SEARCH Week 12

Week 12 how searchenginessearch

Embed Size (px)

Citation preview

H O W S E A R C H E N G I N E S S E A R C H

Week 12

Spiders and Algorithms

Search engines perform two technical tasks:

Search and Structure

search for new sites and add them to their databases

structure searches for users of the search engine

Structuring Searches for Users

All search engines use a search algorithm to structure searches for users.

In computer science, a search algorithm is an algorithm for “finding an item with specified properties among a collection of items”. *http://en.wikipedia.org/wiki/Search_algorithm

Search Algorithms are Proprietary

Most search engines keep their search algorithms secret, or proprietary [that means it is a corporate property, and they keep it partially secret.

All search engines feature different [or at least slightly different] search algorithms

All search engines use some form of their own search algorithm

Google Search Algorithm

Let's look at the most well know search engine and how it searches

Google: their search algorithm operates according to a basic principle of relevance ranking

results are ranked according to an algorithm they call PageRank [name is patented by Google]

See link for basic explanation of origins of Google search *http://en.wikipedia.org/wiki/Google

A picture of Google’s Search Algorithm:

Google’s Search Algorithm

See link for explanation:

From http://en.wikipedia.org/wiki/PageRank

Pretty Mathematical! We won’t go into all that.

Algorithms in Simple Terms

However, the algorithm [as most search engine algorithms] can be broken down into basic concepts of 1) popularity, 2 ) density, and 3) keywords:

site popularity [how many other users search the site] site density [how many other sites link to it] Keywords

Keywords are still key [no pun intended] and how they intersect with the first two

These are considered in:

ranking a site including it in your search results.

Updating Search Algorithms

If that weren’t enough, search engines regularly update their search algorithms

http://www.webmarketingpros.com/blog/how-to-recover-from-the-google-penguin-update/

http://blog.junta42.com/2011/04/4-steps-to-make-googles-panda-update-work-for-you/

Google released two updates in the past few years, termed ‘Panda’ and ‘Penguin’ [similar to updates to PC or MAC operating system, down to the catchy names]

Updating Search Algorithms

These updates were designed to catch and eliminate from searches ‘low quality sites’ [those with little content, ad-heavy or replicating other pages]

http://www.business2community.com/seo/animalistic-algorithms-googles-panda-and-penguin-shakeups-0270910

http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-in.html

Technical Stuff

This is the ‘background’ information on how search engines search

This ‘technical stuff’ is not information we need to activate to use a search engine

[i.e. we don’t need to explicitly construct search algorithms or know about spiders]

However, it can help us in thinking of how to approach constructing searches

Web and Database Searching

Refer to p. 67, textbook, for discussion of controlled vocabulary:

All databases [this includes online catalogues and subscription databases like Ebsco] include controlled vocabulary

Controlled Vocabulary – LC Subject Headings

Databases and online catalogues [Ebsco, our RHC catalogue for books as well as others]

• Use controlled

• vocabulary

• Allow us to

• narrow by

• subjects

Web and Database Searching

The Internet features no controlled vocabulary

– i.e. no subject headings or agreed-upon subjects in databases

We can, however:

• Search specific

• fields

• Eliminate or

• specify terms or

• related terms