Upload
marycia-kharbuki
View
76
Download
0
Embed Size (px)
Citation preview
May 1, 2023 1
How does Search Engines work?MARYCIA KHARBUKI
1TJ13MCA03MCA II Yr
May 1, 2023 2
What are Search Engines?A Search Engine is a software system that is designed to
search for information on the World Wide Web.The World Wide Web (abbreviated as www or w3, also known as
Web) is a big collection of HTML pages on the Internet.Each HTML page is connected through hyperlinks and URLs.
27/08/14 3
May 1, 2023 4
3 Basic Tasks
Crawlin
gIndexing
Searchin
g
May 1, 2023 5
CrawlingEvery search engine has tiny automated software programs known as robots,
crawlers or spiders.
These crawlers visits the websites, read the site’s meta tags read the site’s actual data and also follows the links that the site connects to. Meta tag – allows the owner of a page to specify key words and concepts. It also can include a STOP signal to prevent the crawlers from reading them.
The crawler returns all that information back to a central depository, where the data is indexed.
The crawler will periodically return to the sites to check for any information that has changed.
The frequency with which this happens is determined by the administrators of the Search Engine.
27/08/14 6
STOP!RESTRICTED DATA
May 1, 2023 7
Some QueriesWhere does the crawler starts from?
The usual starting points are lists of heavily used servers and very popular pages. E.g.: News channel sites, Online Newspapers sites, etc.
The crawl process can also begins with a list of web address from past crawls or sites provide by website owners.
How many spiders does a search engine sends? Google uses four Googlebots. Their system could crawl over 100 pages per second, generating
around 600 kilobytes of data each second.
May 1, 2023 8
IndexingThe search engine analyzes the contents of each page to
determine how it should be indexed. Words are extracted from the titles, page content or headings. Stores the number of times that the word appears on the page. Assigns a weight to each entry, with increasing values assigned
to words as they appear near the top of the document, in subheadings, in links or in the title of the page.
The Index is then build considering the given factors.
An(inverted) index doesn't contain documents but a list of words or phrases and, for each of them, a reference to all the documents that are related to that word or phrase.
The processed pages are then transferred to the search engine’s database.
May 1, 2023 9
Indexing
May 1, 2023 10
Some QueriesDoes the search engine stores every web page it finds from
the internet?Depending on some key factors, it decides to keep or discards
the web page.If it clears the key factors then everything about the webpage is
stored(text, audio, video, files, etc.).
Can a search engine database really hold all web pages of the internet?Data or web pages are encoded before storing to safe storage
space.
May 1, 2023 11
Page RankPage Rank was named after Larry Page, one of
the founders of Google.Page Rank is an algorithm used by Google
Search to rank websites in their search engine results.
Page Rank is a way of measuring the importance of website pages.
Page Rank works by counting the number and quality of links to a page to roughly estimate the importance of a website.
The assumption is that more important websites are likely to receive more links from other websites.
May 1, 2023 12
SearchingWhen a user enters a query into a search engine, the engine
examines its index.The search engine’s index then determines the relevant web
pages from the database.The search engine return only those results that are relevant
or useful to the searcher’s query.
May 1, 2023 13
Searching
Software Engineering
May 1, 2023 14
Searching
May 1, 2023 15
Searching
May 1, 2023 16
Some FactsThe Search Engine does not search through the whole
Internet when we type a query.It searches through the copy of the Internet stored in the
Engine’s Database.The Search Engine’s index is matched against to the query
from the browsers.When a relevancy occurs, the web pages are sent from the
Engine’s Database.Therefore, searched results occurs within half a second after
typing the query.
May 1, 2023 17
How does one Search Engine differs from the other?Some search engines do not send their spiders too often to
crawl the web.
Some search engines are directory based.Contains only the URL of the page, the page title and a
paragraph description of the page.
Many search engines does not have the Page Rank criteria.
Comparatively less results to other search engines.
May 1, 2023 18
Top Search EnginesGoogle.com - Launched in 1998 by Larry Page and Sergey Brin,
Google is by far the most popular search engine globally.Yahoo.com - Started in 1994 by David Filo and Jerry Yang, Yahoo!
is the second biggest search engine on the web.Bing.com - Launched in 2009 by Microsoft, Bing is their latest
web-based search service.Ask.com - Ask was founded in 1996 with the idea of allowing users
to get answers to questions posed in everyday, natural language, as well as traditional keyword searching.
DuckDuckGo.com - Founded in September of 2008 by Gabriel Weinberg, DuckDuckGo (DDG) prides itself on respecting user privacy.
May 1, 2023 19
Referenceshttp://computer.howstuffworks.com/internet/basics/search-engine.htm
- How Search Engine works.http://en.wikipedia.org/wiki/Web_search_engine - Web Search Engine.https://www.youtube.com/watch?v=BNHR6IQJGZs – How Search
Works.https://www.youtube.com/watch?v=4ISBeu5IAdM – How Search
Engines Work.http://www.webopedia.com/DidYouKnow/Internet/HowWebSearchEngi
nesWork.asp - How do web search engines work.
http://www.google.co.in/intl/en/insidesearch/howsearchworks/thestory/ - How search works.
May 1, 2023 20
THANK YOU!