Upload
jamison-gadsden
View
218
Download
4
Tags:
Embed Size (px)
Citation preview
Efficient Interactive Fuzzy Keyword Search
Shengyue Ji1, Guoliang Li2, Chen Li1, Jianhua Feng2
1 University of California, Irvine2 Tsinghua University
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Too many
results!
Traditional Keyword Search
No result!
Complicated and stillno result!
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Interactive Fuzzy Keyword Search
Features: Interactive: data exploration Fuzzy: error tolerant Multiple keywords: search
on-the-fly
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Fundamentals
Data R: a set of records W: a set of distinct words
Query Q = {p1, p2, …, pl}: a set of prefixes δ: Edit-distance threshold
Query result RQ: a set of records such that each record has
all query prefixes or their similar forms (conjunctive)
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Contributions / Outline
Step 1 Incremental fuzzy prefix matching
Step 2 Multi-prefix intersection methods Cache-based prefix intersection
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Observation
W = {exam, example, exemplar, exempt, sample}
δ = 2Prefix Distanc
e
exam 2
examp 1
exampl 0
example 1
exemp 2
exempt 2
exempl 1
exempla 2
sampl 2
Prefix Distance
examp 2
exampl 1
example 0
exempl 2
exempla 2
sample 2
delete e
delete e
match e
delete e
substitute e with a
match e
Q’ = exampl Q = example
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Trie Indexing
Computing set of active nodes ΦQ
Initialization Incremental step
e
x
a
m
p
l
$
$
e
m
p
l
a
r
$
t
$
s
a
m
p
l
e
$
Prefix Distance
examp 2
exampl 1
example 0
exempl 2
exempla 2
sample 2
Activ
e n
odes fo
r Q =
exam
ple
e
2
1
0
2
2
2
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Initialization
Q = ε
e
x
a
m
p
l
$
$
e
m
p
l
a
r
$
t
$
s
a
m
p
l
e
$
Prefix Distance
0
1 1
2 2
Prefix Distance
0
e 1
ex 2
s 1
sa 2
Prefix Distance
ε 0
Initializing Φε with all nodes within in depth of δ
e
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Incremental Computation: Algorithm
Incremental computation from ΦQ’ to ΦQ
add(ΦQ , <n, d>) has effect only if there exists no active node in ΦQ with the same n and smaller d
FOR EACH <n, d> FROM ΦQ’
Deletion add(ΦQ , <n, d+1>)
Substitution
FOR EACH n’ FROM non-matching children of n
add(ΦQ , <n’, d+1>)
Match add(ΦQ , <m, d>)(m is the matching child of n)
Insertion FOR EACH m’ FROM descendents of madd(ΦQ , <m’, d+x>)(x is the distance from m’ to m)
Algorithm Details
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
e
Incremental Computation: Example
Q = e
e
x
a
m
p
l
$
$
e
m
p
l
a
r
$
t
$
s
a
m
p
l
e
$
Prefix Distance
ε 0
e 1
ex 2
s 1
sa 2Prefix
# Op
Base
Op
ε 1 ε del e
s 1 ε sub e/s
e 0 ε mat e
ex 1 ε ins x
exa 2 ε Ins xa
exe 2 ε Ins xe
Prefix
# Op
Base
OpPrefix
# Op
Base
Op
ε 1 ε del e
Prefix
# Op
Base
Op
ε 1 ε del e
s 1 ε sub e/s
Prefix
# Op
Base
Op
ε 1 ε del e
s 1 ε sub e/s
e 0 ε mat e
1
10
1
2 2
e 2 e del e
ex 2 e sub e/x
ex 3 ex del e
exa 3 ex sub e/aexe 2 ex mat e
s 2 s del e
sa 2 s sub e/a
sa 3 sa del e
Activ
e n
odes fo
r Q =
ε Activ
e n
odes fo
r Q =
e
2
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Incremental Computation: Discussion
Insertions Needed after matches Not needed after deletions and substitutions
deletions and insertions do not co-occur in adjacent positions
adjacent substitutions and insertions are interchangeable
Correctness and Completeness Can be proved by reducing from/to edit-distance
computation
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Outline
Step 1 Incremental fuzzy prefix matching
Step 2 Multi-prefix intersection methods Cache-based prefix intersection
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Multi-Prefix Intersection
Q = vldb li
Multi-prefix intersection To return records such
that each record has all query keywords as prefixes (or their similar forms)
ID Record
1 Li data…
2 data…
3 data Lin…
4 Lu Lin Luis…
5 Liu…
6 VLDB Lin data…
7 VLDB…
8 Li VLDB…
6 VLDB Lin data…
8 Li VLDB…
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Multi-Prefix Intersection: Method 1
ID Record
1 Li data…
2 data…
3 data Lin…
4 Lu Lin Luis…
5 Liu…
6 VLDB Lin data…
7 VLDB…
8 Li VLDB…
d
a
t
a
$
l
i
n u
$
u
$
v
l
d
b
$
1236
5
4 678
$
346
i
s
$
18
$
4
1 3 4 5 6 8
6 7 8livldb
6 8
Q = vldb li
Space cost Inverted index
Time cost Union + intersection
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Multi-Prefix Intersection: Method 2
Forward List
1 2
1
1 3
3 5 6
4
1 3 7
7
2 7
d
a
t
a
$
l
i
n u
$
u
$
v
l
d
b
$
1236
5
4 678
$
346
i
s
$
18
$
4
ID Record
1 Li data…
2 data…
3 data Lin…
4 Lu Lin Luis…
5 Liu…
6 VLDB Lin data…
7 VLDB…
8 Li VLDB…
[1, 7]
[1, 1]
[1, 1]
[1, 1]
[1, 1]
[2, 6]
[2, 4]
1
2
3 4
5
6 7
[3, 3] [4, 4]
[5, 6]
[6, 6]
[6, 6]
[7, 7]
[7, 7]
[7, 7]
[7, 7]
Q = vldb li
678 [2, 4]
Read each Verify/Probe
6 VLDB Lin data…
1 3 7
8 Li VLDB… 2 7
Space cost Inverted + forward index
Time cost Probing forward lists
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Experimental Results
Computing similar prefixes
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Experimental Results
Multi-prefix intersection
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Experimental Results
Overall scalability
Questions?
Thank You!
Questions?
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
TASTIER: Efficient Auto-Completion, Type-Ahead Searchhttp://tastier.ics.uci.edu/
UC Irvine & TsinghuaEfficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Chen Li, Jianhua Feng