32
Supporting Efficient Top-k Queries in Type-Ahead Search Guoliang Li 1 , Jiannan Wang 1 , Chen Li 2 , Jianhua Feng 1 1 Tsinghua University 2 UC Irvine, Bimaple Technology Inc. SIGIR 2012, Portland, Oregon

Supporting Efficient Top-k Queries in Type-A h ead Search

  • Upload
    gittel

  • View
    59

  • Download
    0

Embed Size (px)

DESCRIPTION

Supporting Efficient Top-k Queries in Type-A h ead Search. Guoliang Li 1 , Jiannan Wang 1 , Chen Li 2 , Jianhua Feng 1 1 Tsinghua University 2 UC Irvine, Bimaple Technology Inc. . SIGIR 2012, Portland, Oregon. Query suggestions. Type-ahead search (instant search). - PowerPoint PPT Presentation

Citation preview

Page 1: Supporting Efficient Top-k Queries in Type-A h ead Search

Supporting Efficient Top-k Queries in Type-Ahead

SearchGuoliang Li1, Jiannan Wang1, Chen Li2,

Jianhua Feng1

1 Tsinghua University2 UC Irvine, Bimaple Technology Inc.

SIGIR 2012, Portland, Oregon

Page 2: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 2

Query suggestions

Li, Wang, Li, and Feng

Page 3: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 3

Type-ahead search (instant search)

Li, Wang, Li, and Feng

Finding answers instantly!

Page 4: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 4

ipubmed.ics.uci.edu

Li, Wang, Li, and Feng

Fuzzy search

Page 5: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 5

Advantages of instant fuzzy search

Li, Wang, Li, and Feng

Save time

Fat fingers!

Mobile friendly

Correct errors

Page 6: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 6

Challenges Speed

“100ms rule” Prefix matching Fuzzy matching

Li, Wang, Li, and Feng

Quality

Page 7: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 7

Techniques for computing top-k answers in instant fuzzy search

without generating all candidates

Li, Wang, Li, and Feng

Contributions

Ranking framework Index Structures Algorithms Experimental evaluation

Page 8: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 8

Outline

Problem Formulation Instant exact search Instant fuzzy search Experiments

Li, Wang, Li, and Feng

Page 9: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 9

Problem Formulation Data: records Query:

w1, w2, …, wm wm partial keyword

Answers: k best records

Li, Wang, Li, and Feng

graph icde li

ID Recordr0 graph icdmr1 graph group luir2 gray icde liur3 graph icde lin luir4 graph group icdm lin

liur5 graph gray gross icdm

lin liur6 gray group icdm lin liur7 gray gross group icde

linr8 gross icde liur9 icdm liu

Prefix

Page 10: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 10

Aggregate

Ranking Framework

Li, Wang, Li, and Feng

graph, gray, gross, icde, lin, liu

Record

Query graph

icde

li

Score(graph) Score(icde) Score(lin)Score(liu)

Max

Page 11: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 11

Trie

Index structures

gr

a

i lcd

e mo

p yh

s ups

i uin u

Li, Wang, Li, and Feng

ID Recordr0 graph icdmr1 graph group luir2 gray icde liur3 graph icde lin luir4 graph group icdm lin

liur5 graph gray gross

icdm lin liur6 gray group icdm lin

liur7 gray gross group icde

linr8 gross icde liur9 icdm liu

Inverted Index

Page 12: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 12

{graph, icde, li} k=1

Basic Solution

gr

a

i lcd

e mo

p yh

s ups

i uin u

graph icdelin

liu

Li, Wang, Li, and Feng

Too many candidates

Page 13: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 13

Optimization 1: Heap-based MethodAggregate

Max Heap

𝑟 ,9𝑟5 ,8

graphicde

linliu

GetMax()

Li, Wang, Li, and Feng

Page 14: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 14

Optimization 2: Top-k List-Merging Algorithm

Example: Threshold algorithm

Li, Wang, Li, and Feng

T = 15

= 17= 14= 12= 12

Random Access

Sorted Access

Sorted Access

Early termination

Page 15: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 15

Efficient Random Access: How?ID “grap

h”“icde” “li”

7 0 ?

gr

a

i lcd

e mo

p yh

s ups

i uin u

ID Recordr0 graph icdmr1 graph group luir2 gray icde liur3 graph icde lin luir4 graph group icdm

lin liu… …

Li, Wang, Li, and Feng

Page 16: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 16

Forward index [Ji et al. WWW’09]ID “grap

h”“icde” “li”

7 0 ?

ID Forward listr0 <1, 2> <5, 3>r1 <1, 3> <1, 9> <9, 6>r2 <2, 9> <5, 2> <8, 3>r3 <1, 4> <5, 2> <7, 9> <9,

4>r4 <1, 7> <4, 3> <6, 9> <7,

2> <8, 7>… …Keyword IDWeight

Li, Wang, Li, and Feng

gr

a

i lcd

e mo

p yh

s ups

i uin u

12

3 45 6

7 8 9[1, 1][1, 2]

[3,3] [4, 4]

[3, 4]

[1, 4]

[1, 4]

[2,2][5, 6][5, 6][5, 6] [7, 8]

[7, 9]

[9, 9]

Page 17: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 17

gr

a

i lcd

e mo

p yh

s ups

i uin u

12

3 45 6

7 8 9[1, 1][1, 2]

[3,3] [4, 4]

[3, 4]

[1, 4]

[1, 4]

[2,2][5, 6][5, 6][5, 6] [7, 8]

[7, 9]

[9, 9]

ID Forward listr0 <1, 2> <5, 3>

r1 <1, 3> <1, 9> <9, 6>

r2 <2, 9> <5, 2> <8, 3>

r3 <1, 4> <5, 2> <7, 9> <9, 4>

r4 <1, 7> <4, 3> <6, 9> <7, 2> <8, 7>

… …

Random Access Using Forward IndexID “grap

h”“icde” “li”

7 07?

Li, Wang, Li, and Feng

Page 18: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 18

Outline

Problem Formulation Instant exact search Instant fuzzy search Experiments

Li, Wang, Li, and Feng

Page 19: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 19

Ranking Framework (Fuzzy matching)

Li, Wang, Li, and Feng

Aggregate

graph, gray, icdm, gross, lin, liu

Record

Query graph

icde

li

Score(graph) Sim(icde,icdm)*Score(icdm) Score(lin)Score(liu)

MaxSim(li,i)*Score(lin)

Page 20: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 20

{graph, icde, li}, similarity threshold τ=0.45

Computing Similar Prefixes [Ji et al. WWW’09]

gr

a

i lcd

e mo

p yh

s ups

i uin u

Li, Wang, Li, and Feng

Page 21: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 21

Top-k Algorithm

icdeicdm lin

liu

lui

Max Heap

𝑟3 ,9𝑟5 ,8 𝑟 4 ,4.5

3 2

similarity

×0.5 ×1 ×1 ×0.5×0.5

icdeicdm

×0.5×1

𝑟 4 ,4.54 Max Heap

𝑟5 ,9Max Heap

GetMax()

sum

×1

graph icde li

graph

GetMax() GetMax()

Li, Wang, Li, and Feng

Page 22: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 22

Probing on Forward Lists

Efficient Random Access (method 1)

ID “graph”

“icde” “li”

7 9 ?

ID Forward listr0 <1, 2> <5, 3>r1 <1, 3> <1, 9> <9, 6>r2 <2, 9> <5, 2> <8, 3>r3 <1, 4> <5, 2> <7, 9> <9,

4>r4 <1, 7> <4, 3> <6, 9> <7,

2> <8, 7>… …

Binary Search: [5,6], [7,9], [7,8], [9,9], 7, 8, 9

Li, Wang, Li, and Feng

gr

a

i lcd

e mo

p yh

s ups

i uin u

12

3 45 6

7 8 9[1, 1][1, 2]

[3,3] [4, 4]

[3, 4]

[1, 4]

[1, 4]

[2,2][5, 6][5, 6][5, 6] [7, 8]

[7, 9]

[9, 9]

Page 23: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 23

Efficient Random Access (method 2) Probing on Trie Leaf Nodes

ID “graph”

“icde” “li”

7 9 ?

gr

a

i lcd

l mo

p yh

s ups

i uin u

12

3 4

5 67 8 9[1,1]

[1,2][3,3] [4,4]

[3,4]

[1,4][1,4]

[2,2][5,6][5,6][5,6] [7,8]

[7,9]

[9,9]ID Forward listr0 <1, 2> <5, 3>

r1 <1, 3> <1, 9> <9, 6>

r2 <2, 9> <5, 2> <8, 3>

r3 <1, 4> <5, 2> <7, 9> <9, 4>

r4 <1, 7> <4, 3> <6, 9> <7, 2> <8, 7>

… …

li, 0.5 li, 0.5

li, 1li, 1

li, 0.5

Traverse the forward list of Li, Wang, Li, and Feng

Page 24: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 24

Optimization by materializing union lists

gr

a

i lcd

e mo

p yh

s ups

i uin u

Time/space tradeoff Cost-based analysis for a space budget

Li, Wang, Li, and Feng

Page 25: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 25

Outline

Problem Formulation Instant exact search Instant fuzzy search Experiments

Li, Wang, Li, and Feng

Page 26: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 26

Data sets and index costs

Li, Wang, Li, and Feng

Page 27: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 27

Exact Search (DBLP)

k=10, similarity threshold τ=0.6Li, Wang, Li, and Feng

Page 28: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 28

Exact Search (DBLP)

k=10, similarity threshold τ=0.6Li, Wang, Li, and Feng

Page 29: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 29

Fuzzy Search

DBLP, k=10, similarity threshold τ=0.6Li, Wang, Li, and Feng

TA

NRA

Page 30: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 30

Other results (not included in the paper)

More general ranking (e.g., positional information)

Other languages Location-based search

Li, Wang, Li, and Feng

Page 31: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 31

Conclusions (ipubmed.ics.uci.edu)

Efficient techniques for instant fuzzy search

Li, Wang, Li, and Feng

Page 32: Supporting Efficient Top-k Queries in Type-A h ead Search

Tsinghua/UC Irvine/Bimaple 32

Acknowledgements The authors have financial interest in Bimaple

Technology Inc., a company currently commercializing some of the techniques described in this publication.

Chen Li was partially supported by NIH grant 1R21LM010143-01A1.

Guoliang Li, Jianan Wang, and Jianhua Feng were partly supported by the National Natural Science Foundation of China under Grant No. 61003004, the National Grand Fundamental Research 973 Program of China under Grant No. 2011CB302206, a project of Tsinghua University under Grant No. 20111081073, and the “NExT Research Center” funded by MDA, Singapore, under the Grant No. WBS:R-252-300-001-490.

Li, Wang, Li, and Feng