DINESH (Search Engine Seminar)

Embed Size (px)

Citation preview

  • 8/7/2019 DINESH (Search Engine Seminar)

    1/17

    DINESH KUMARRoll: 07/CSE/3402

  • 8/7/2019 DINESH (Search Engine Seminar)

    2/17

    WEB SEARCH ENGINE Aweb search engine is a tool designed to search for information on

    the World Wide Web. The search results may consist of web pages,images, information and other types of files. Search engine workalgorithmically or are a mixture of algorithmic and human input.

    According to netcraft

    there are around 240000000

    web domains globally.

  • 8/7/2019 DINESH (Search Engine Seminar)

    3/17

    Current search engines:

  • 8/7/2019 DINESH (Search Engine Seminar)

    4/17

    Generation of search engine First generation search engine:First generation search engine: Search results were depended on what was on the Web page. factors included

    keyword density, title, and where in the document keywards appeared.

    First generation added relevancy for META tags, keywords in the domain name,and a few bonus points for having keywords in the URL.

    Second generation search engine:Second generation search engine: Employ tracking clicks, link popularity and link quality. Then they added context

    where two-word keyword pairs were extracted from a page to better categorize it.

    Google's Page Rank system and the length of visits are the evidence of 2nd

    generation search engine.

    Third generation search engine:Third generation search engine: It adds word stemming to keep a search in context. Auto extraction of keyword

    pairs helps categorize a page. It extracts data about your individual searching

    habits. It adds Web maps which are a useful filtering tool to get rid of duplicatesites.

  • 8/7/2019 DINESH (Search Engine Seminar)

    5/17

    How search engine works:A search engine operates, in the following order Web crawling Indexing

    Searching

  • 8/7/2019 DINESH (Search Engine Seminar)

    6/17

    Web crawling Web search engines work by storing information about many web pages, from

    the WWW by a Web crawler (spider) an automated Web browser whichfollows every link it sees.

    Googlebot is Googles web crawling robot.It functions like web browser, by sending arequest to a web server for a web page,

    downloading the entire page, then handingit off to Googles indexer.

    Search engine spiders do not read pagesthe way a human does. Instead, they tendto see only particular stuff and are blind

    for many extras (Flash, JavaScript ,images)that are intended for humans.

  • 8/7/2019 DINESH (Search Engine Seminar)

    7/17

    Spider simulator:Bput.org

    As we can see theimages,flash,javascript

    /vbscript does not

    have any Impact on

    the webspider.

    The only thing matters

    is text, in-bound / out-bound links, meta key-

    words etc.

  • 8/7/2019 DINESH (Search Engine Seminar)

    8/17

    Indexing

    Web crawler gives the indexer the full text of the pages it finds. These pages arestored in Googles index database by search term, with each index entrystoring a list of documents in which the term appears and the location withinthe text where it occurs. This data structure allows rapid access to documentsthat contain user query terms.

    To improve search performance,

    Google ignores stop words (suchas the, is, on, or, of, how, why,as well as certain single digitsand single letters). The indexeralso ignores some punctuationand multiple spaces, as well as

    converting all letters to lowercase,to improve Googles performance.

  • 8/7/2019 DINESH (Search Engine Seminar)

    9/17

    Searching:The query processor hasseveral parts, including the user

    interface (search box), the

    engine that evaluates queries

    and matches

    them to relevant documents,

    and the results formatter.

  • 8/7/2019 DINESH (Search Engine Seminar)

    10/17

    The future of search engine:

    3D search engine

    Theme search engine

    Meta search engine

    its time to look beyond google

  • 8/7/2019 DINESH (Search Engine Seminar)

    11/17

    3D search engine: A Search engines that can mine catalogs of three-dimensional

    objects , which lets users create images as queries for searches. Query formulation

    Users can select objects from a catalog of images based onproduct groupings, or they can let users draw a 2D or 3Drepresentation of the object they want to find.

    Search processIt uses algorithms to convert the selected or drawn image-basedquery into a mathematical model. The search system thencompares the mathematical description of the drawn or selectedobject to those of3D objects stored in a database, looking forsimilarities in the described features.

    Ex : Princeton 3D Model Search Enginehttp://shape.cs.princeton.edu/search.html

  • 8/7/2019 DINESH (Search Engine Seminar)

    12/17

  • 8/7/2019 DINESH (Search Engine Seminar)

    13/17

  • 8/7/2019 DINESH (Search Engine Seminar)

    14/17

    Theme search engine: It is called as `in context' searching or on topic

    searching.

    What you say your page is about, what the search

    engine calculates your page to be about, and what therest of the Internet thinks your page is about, mustmatch, according to their mathematical formulas.

    The 2nd & 3rd Generation search engines are exampleof theme search engine.

  • 8/7/2019 DINESH (Search Engine Seminar)

    15/17

    Meta search engine: A meta-search engine is a search tool that sends user requests to several other

    search engines and/or databases and aggregates the results into a single list or

    displays them according to their source.

    Web is too large for any one search engine to index it all and that morecomprehensive search results can beobtained by combining the results fromseveral search engines. This alsomay save the user from having touse multiple search engines separately.This also helps in deep web searching.

    Metasearch engines create what is

    known as a virtual database.They take a user's request, pass it toseveral other heterogeneous searchengines and then compile the results.

  • 8/7/2019 DINESH (Search Engine Seminar)

    16/17

    Search engine optimization

    Search engine optimization (SEO) is the process ofimproving the volume or quality of traffic to a web sitefrom search engines.

    Current Optimization Strategies

    1. Cloaking: Hide it from the spiders eye. 2. Keyword Weight: Use proper key word.

    4. Stop Words: Be careful with stop words.

    5. Redundancy: Dont use same pages again.

    6. Lengthy Pages: focus on one topic

  • 8/7/2019 DINESH (Search Engine Seminar)

    17/17