Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

1

Efficient Instant-Fuzzy Search with Proximity

Ranking

Authors:Inci Centidil, Jamshid Esmaelnezhad, Taewoo

Kim, and Chen Li

IDCE Conference 2014

Presented by: Priagung Khusumanegara

2

• System finds answers to a query instantly while user types in keywords character-by-character.

• Fuzzy search improves user search experiences by finding relevant answers with keywords similar to query keywords.

• A main computational challenge in this paradigm is the high speed requirement

• At the same time, we also need good ranking functions that consider the proximity of keywords to compute relevance scores

Abstract

3

Problem Statement & Proposed

Solution

• Problem Statement:o Achieving efficient time & space complexities.

• Solution: o Index phrases with proper indexing scheme and o Develop an incremental-computation algorithm for efficiently

segmenting a query into phrases and computing relevant answers.

• Result Metrics: Experimental study on real data sets to show the tradeoffs between time, space, and quality of these solutions.

4

General Idea of Instant Search

• Instant search returns the answers immediately based on a partial query a user has typed in

• Many users prefer the experience of seeing the search results instantly and formulating their queries accordingly instead of being left in the dark until they hit the search button

5

ArchitecturePhrase Validator: • When a search server

receives a request, it first identifies all the valid phrases in the query that are in the dictionary D, and intersects their inverted lists.

• The Phrase Validator identifies the phrases (called “valid phrases”) in the query that are similar to a term in the dictionary D.

6

Architecture (Cont’d)Query Plan Builder: • After identifying the valid

phrases, the Query Plan Builder generates a Query Plan Q, which contains all the possible valid segmentations in a specific order.

• The ranking of Q determines the order in which the segmentations will be executed.

7

Architecture (Cont’d)Index Searcher: • After Q is generated, the

segmentations are passed into the Index Searcher one by one until the top-k answers are computed, or all the segmentations in the plan are used.

8

Architecture (Cont’d)Cache Module: • The Phrase Validator uses

the Cache module to validate a phrase without traversing the trie from scratch,

• While the Index Searcher benefits from the Cache by being able to retrieve the answers to an earlier query to reduce the computational cost.

9

Computing Valid Phrases

10

Generating Valid Segmentations

11

Incremental Computation of Valid Phrases

12

Example Table for Architecture

Explanation

• This data is structured in indexed format. Two types of indices are used to structure this data

1. Trie Indices2. Forward Indices

13

Index Structure

• Indiceso TrieoForward

14

ExperimentsIn the experiments, they implemented the following method:1. FindAll (“FA”)2. QuerySegmentation (“QS”)3. Term Pair (“TP”)

15

Efficiency of Computing Valid Phrases

16

Query Time

17

Cache Hit Rate

18

Scalability

19

Conclusion• They studied how to improve ranking of an

instant-fuzzy search system by considering proximity information when we need to compute top-k answers

• They presented an incremental-computation algorithm for finding the indexed phrases in a query efficiently

• The experiments on real data showed the efficiency of the proposed technique for 2-keyword and 3-keyword queries that are common in search applications.

Documents

Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented