16
Introduction

Search Engine Project Presentation

Embed Size (px)

Citation preview

Page 1: Search Engine Project Presentation

Introduction

Page 2: Search Engine Project Presentation

• SUCHE German word for Search

• Project is a fully functioning extensible Search Engine.

• Also has:– Auto Completion– Spell Correction– Query Language/Grammar Parsing– Authentication– Relevant Suggestions– AJAX Based Querying– Extensible via Plug-ins

Page 3: Search Engine Project Presentation

Overall structure

Page 4: Search Engine Project Presentation

Methodology

Page 5: Search Engine Project Presentation

User Management• Django Authentication

framework:– User model– validation– authentication– Authorized access

Page 6: Search Engine Project Presentation

Spell Correction• Runs continuously in Background• Reads words through the

interface provided:– DB– Named Pipes

• Words with counts loaded at startup

Page 7: Search Engine Project Presentation

Spell Correction• Based on Baye’s Theorem of

conditional probability.

• We use : argmaxc P(w|c) P(c) / P(w) Where: – P(c), the probability that a proposed correction c stands on its own.

– P(w|c), the probability that w would be typed in a text when the author meant c.

 

Page 8: Search Engine Project Presentation

Spell Correction• Process:– Read the word– Calculate possible words by

deletion, transposition, instertion, etc.

– Check if the word is currently present and find its occurance probability.

– Return maximum probable word.

Page 9: Search Engine Project Presentation

Plugin Support• Easy extension of required features by the

users.

• Emphasizes Selective Implentation

• Plug-in designers can design and submit Plug-ins for approval.

• Separative Deployment

• Private-Key based Verification

Page 10: Search Engine Project Presentation

Plugin Support• Grammar/Language Parsing– Each Plug-in has a specific grammar

• E.g. <temperature|temp><?for|of><$query>

• This is used for: temperature of kathmandu

• Returns ‘kathmandu’ to the Temperature Plugin

Page 11: Search Engine Project Presentation

Plugin Support• Process:– Read Corrected Query– Format Words I,e, remove unwanted

spaces, symbols.–Match Stored Grammar– Call Corresponding Results– Return Result

Page 12: Search Engine Project Presentation

Crawler•Process:–Read scheduled URLs–Visit URLs for fresh content–Save complete page–Schedule another crawl date

Page 13: Search Engine Project Presentation

Indexer•Process:–Read Unprocessed websites–Undo result of previous content–Analyze content–Create Reverse Index

Page 14: Search Engine Project Presentation

Search•Process:–Read query from user–Pass to plugin Handler–Search for each word in query–Combine and rank the result–Display the final result–Uses pagerank algorithm

Page 15: Search Engine Project Presentation

Autocompletion•Process:–Get current incomplete query–Search for query in cache–Complete the query using language models–Return the various alternatives

Page 16: Search Engine Project Presentation

Further recommendation• Image search/classification• Video search• Knowledge extraction• Improved NLP