19
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented by: Priagung Khusumanegara 1

Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

Embed Size (px)

Citation preview

Page 1: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

1

Efficient Instant-Fuzzy Search with Proximity

Ranking

Authors:Inci Centidil, Jamshid Esmaelnezhad, Taewoo

Kim, and Chen Li

IDCE Conference 2014

Presented by: Priagung Khusumanegara

Page 2: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

2

• System finds answers to a query instantly while user types in keywords character-by-character.

• Fuzzy search improves user search experiences by finding relevant answers with keywords similar to query keywords.

• A main computational challenge in this paradigm is the high speed requirement

• At the same time, we also need good ranking functions that consider the proximity of keywords to compute relevance scores

Abstract

Page 3: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

3

Problem Statement & Proposed

Solution

• Problem Statement:o Achieving efficient time & space complexities.

• Solution: o Index phrases with proper indexing scheme and o Develop an incremental-computation algorithm for efficiently

segmenting a query into phrases and computing relevant answers.

• Result Metrics: Experimental study on real data sets to show the tradeoffs between time, space, and quality of these solutions.

Page 4: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

4

General Idea of Instant Search

• Instant search returns the answers immediately based on a partial query a user has typed in

• Many users prefer the experience of seeing the search results instantly and formulating their queries accordingly instead of being left in the dark until they hit the search button

Page 5: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

5

ArchitecturePhrase Validator: • When a search server

receives a request, it first identifies all the valid phrases in the query that are in the dictionary D, and intersects their inverted lists.

• The Phrase Validator identifies the phrases (called “valid phrases”) in the query that are similar to a term in the dictionary D.

Page 6: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

6

Architecture (Cont’d)Query Plan Builder: • After identifying the valid

phrases, the Query Plan Builder generates a Query Plan Q, which contains all the possible valid segmentations in a specific order.

• The ranking of Q determines the order in which the segmentations will be executed.

Page 7: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

7

Architecture (Cont’d)Index Searcher: • After Q is generated, the

segmentations are passed into the Index Searcher one by one until the top-k answers are computed, or all the segmentations in the plan are used.

Page 8: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

8

Architecture (Cont’d)Cache Module: • The Phrase Validator uses

the Cache module to validate a phrase without traversing the trie from scratch,

• While the Index Searcher benefits from the Cache by being able to retrieve the answers to an earlier query to reduce the computational cost.

Page 9: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

9

Computing Valid Phrases

Page 10: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

10

Generating Valid Segmentations

Page 11: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

11

Incremental Computation of Valid Phrases

Page 12: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

12

Example Table for Architecture

Explanation

• This data is structured in indexed format. Two types of indices are used to structure this data

1. Trie Indices2. Forward Indices

Page 13: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

13

Index Structure

• Indiceso TrieoForward

Page 14: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

14

ExperimentsIn the experiments, they implemented the following method:1. FindAll (“FA”)2. QuerySegmentation (“QS”)3. Term Pair (“TP”)

Page 15: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

15

Efficiency of Computing Valid Phrases

Page 16: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

16

Query Time

Page 17: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

17

Cache Hit Rate

Page 18: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

18

Scalability

Page 19: Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented

19

Conclusion• They studied how to improve ranking of an

instant-fuzzy search system by considering proximity information when we need to compute top-k answers

• They presented an incremental-computation algorithm for finding the indexed phrases in a query efficiently

• The experiments on real data showed the efficiency of the proposed technique for 2-keyword and 3-keyword queries that are common in search applications.