Upload
aavaas-gajurel
View
30
Download
0
Embed Size (px)
Citation preview
Introduction
• SUCHE German word for Search
• Project is a fully functioning extensible Search Engine.
• Also has:– Auto Completion– Spell Correction– Query Language/Grammar Parsing– Authentication– Relevant Suggestions– AJAX Based Querying– Extensible via Plug-ins
Overall structure
Methodology
User Management• Django Authentication
framework:– User model– validation– authentication– Authorized access
Spell Correction• Runs continuously in Background• Reads words through the
interface provided:– DB– Named Pipes
• Words with counts loaded at startup
Spell Correction• Based on Baye’s Theorem of
conditional probability.
• We use : argmaxc P(w|c) P(c) / P(w) Where: – P(c), the probability that a proposed correction c stands on its own.
– P(w|c), the probability that w would be typed in a text when the author meant c.
Spell Correction• Process:– Read the word– Calculate possible words by
deletion, transposition, instertion, etc.
– Check if the word is currently present and find its occurance probability.
– Return maximum probable word.
Plugin Support• Easy extension of required features by the
users.
• Emphasizes Selective Implentation
• Plug-in designers can design and submit Plug-ins for approval.
• Separative Deployment
• Private-Key based Verification
Plugin Support• Grammar/Language Parsing– Each Plug-in has a specific grammar
• E.g. <temperature|temp><?for|of><$query>
• This is used for: temperature of kathmandu
• Returns ‘kathmandu’ to the Temperature Plugin
Plugin Support• Process:– Read Corrected Query– Format Words I,e, remove unwanted
spaces, symbols.–Match Stored Grammar– Call Corresponding Results– Return Result
Crawler•Process:–Read scheduled URLs–Visit URLs for fresh content–Save complete page–Schedule another crawl date
Indexer•Process:–Read Unprocessed websites–Undo result of previous content–Analyze content–Create Reverse Index
Search•Process:–Read query from user–Pass to plugin Handler–Search for each word in query–Combine and rank the result–Display the final result–Uses pagerank algorithm
Autocompletion•Process:–Get current incomplete query–Search for query in cache–Complete the query using language models–Return the various alternatives
Further recommendation• Image search/classification• Video search• Knowledge extraction• Improved NLP