Upload
candice-lambert
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
1
Efficient Instant-Fuzzy Search with Proximity
Ranking
Authors:Inci Centidil, Jamshid Esmaelnezhad, Taewoo
Kim, and Chen Li
IDCE Conference 2014
Presented by: Priagung Khusumanegara
2
• System finds answers to a query instantly while user types in keywords character-by-character.
• Fuzzy search improves user search experiences by finding relevant answers with keywords similar to query keywords.
• A main computational challenge in this paradigm is the high speed requirement
• At the same time, we also need good ranking functions that consider the proximity of keywords to compute relevance scores
Abstract
3
Problem Statement & Proposed
Solution
• Problem Statement:o Achieving efficient time & space complexities.
• Solution: o Index phrases with proper indexing scheme and o Develop an incremental-computation algorithm for efficiently
segmenting a query into phrases and computing relevant answers.
• Result Metrics: Experimental study on real data sets to show the tradeoffs between time, space, and quality of these solutions.
4
General Idea of Instant Search
• Instant search returns the answers immediately based on a partial query a user has typed in
• Many users prefer the experience of seeing the search results instantly and formulating their queries accordingly instead of being left in the dark until they hit the search button
5
ArchitecturePhrase Validator: • When a search server
receives a request, it first identifies all the valid phrases in the query that are in the dictionary D, and intersects their inverted lists.
• The Phrase Validator identifies the phrases (called “valid phrases”) in the query that are similar to a term in the dictionary D.
6
Architecture (Cont’d)Query Plan Builder: • After identifying the valid
phrases, the Query Plan Builder generates a Query Plan Q, which contains all the possible valid segmentations in a specific order.
• The ranking of Q determines the order in which the segmentations will be executed.
7
Architecture (Cont’d)Index Searcher: • After Q is generated, the
segmentations are passed into the Index Searcher one by one until the top-k answers are computed, or all the segmentations in the plan are used.
8
Architecture (Cont’d)Cache Module: • The Phrase Validator uses
the Cache module to validate a phrase without traversing the trie from scratch,
• While the Index Searcher benefits from the Cache by being able to retrieve the answers to an earlier query to reduce the computational cost.
9
Computing Valid Phrases
10
Generating Valid Segmentations
11
Incremental Computation of Valid Phrases
12
Example Table for Architecture
Explanation
• This data is structured in indexed format. Two types of indices are used to structure this data
1. Trie Indices2. Forward Indices
13
Index Structure
• Indiceso TrieoForward
14
ExperimentsIn the experiments, they implemented the following method:1. FindAll (“FA”)2. QuerySegmentation (“QS”)3. Term Pair (“TP”)
15
Efficiency of Computing Valid Phrases
16
Query Time
17
Cache Hit Rate
18
Scalability
19
Conclusion• They studied how to improve ranking of an
instant-fuzzy search system by considering proximity information when we need to compute top-k answers
• They presented an incremental-computation algorithm for finding the indexed phrases in a query efficiently
• The experiments on real data showed the efficiency of the proposed technique for 2-keyword and 3-keyword queries that are common in search applications.