32
Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng (Binghamton University) Abdur Chowdhury (America Online, Inc.)

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng

Embed Size (px)

Citation preview

Effective Keyword Search in Relational Databases

Fang Liu (University of Illinois at Chicago)Clement Yu (University of Illinois at Chicago)Weiyi Meng (Binghamton University)Abdur Chowdhury (America Online, Inc.)

Effective Keyword Search in Relational Databases

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction

Why keyword search in relational databases? We want to search text data in

relational databases SQL with the “contains” operator is not

for non-expert users Keyword search is tremendous

successful in text database by ranking documents based on similarity. It is for non-expert users

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Text data in relational databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

IntroductionSuppose a user is looking for albums titled “off the wall”

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Keyword search is very

successful in text database by ranking documents based on similarity. Google, Yahoo and MSN search are the examples.

So, let’s do keyword search in relational databases!(DBXplorer, BANKS, DISCOVER & IR-style DISCOVER, ObjectRank, Ranking Objects)

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Let’s do it, but how?

What are answers to be ranked? How should we rank these answers?

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction -- an answer

An answer for a given query Q: a tuple tree, in which every leaf node must have at least one keyword in Q.SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Use a slightly modified

algorithm [DISCOVER] to produce all answers for a given query.

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction: Ranking Our focus is on the effectiveness

problem of ranking answers: the more relevant an answer is to the user query, the higher it should be ranked.

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction: Contributions We identify four new factors that are

critical to effective ranking and we propose a new ranking strategy

Design and conduct comprehensive experiments for the effectiveness problem

Experimental results show our strategy is significantly better than existing works in effectiveness

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

Effective Keyword Search in Relational Databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

3.3 IR Ranking Q=(k1, k2, ..,kn), D is a document, Sim(Q,D) is

the ranking score of D.

DQk

DkweightQkweightDQSim,

),(*),(),(

idfndl

ntfDkweight ),(

))ln(1ln(1 tfntf

1ln

df

Nidf

avgdldl

ssndl )1(

tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2

1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8

tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2

1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

Effective Keyword Search in Relational Databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy T=(D1,D2,..Dn), so Si

m(Q,D)Sim(Q,T)

DQk

DkweightQkweightDQSim,

),(*),(),(

TQk

TkweightQkweightTQSim,

),(*),(),(

SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy

TQk

TkweightQkweightTQSim,

),(*),(),(

T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T)

)(*

*),(

TNsizendl

idfntfDkweight

g

i

),(),...,,()( 1, mDkweightDkweightCombTkweight

SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy Tuple Tree Size Normalization

avgsize

TsizessTNsize

)()1()(

)(*

*),(

TNsizendl

idfntfDkweight

g

i

# of tuples in a tuple tree T

SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy Document Length Normalization

Reconsidered

)(*

*),(

TNsizendl

idfntfDkweight

g

i

)ln(1*)1( avgdlavgdl

dlssndl

Document length of Di

Average Document length of the text column of Di

SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy Document Frequency

Normalization

)(*

*),(

TNsizendl

idfntfDkweight

g

i

1ln

g

gg

df

Nidf

SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy T=(D1,D2,..Dn)

maxWgt is the maximum weight(k, Di) sumWgt is the sum of weight(k, Di)

),(),...,,()( 1, mDkweightDkweightCombTkweight

Wgt

sumWgtWgtComb

maxln1ln1*max()

SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T)

),(),...,,()( 1, mDkweightDkweightCombTkweight

)(*

*),(

TNsizendl

idfntfDkweight

g

i

idfndl

ntfDkweight ),(

SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy Schema Terms in Query

lyrics for How come by D12 lusher the singer's lyrics to burn

Phrase-based Ranking Using position information to boast phrase matching

Concept-based Ranking Can improve effectiveness Can assign semantics to answers

SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

Effective Keyword Search in Relational Databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

Experiments – data set A Lyrics Database

50 Queries from an AOL query log Relevance Judgment: pooling + logs

Experiments: some queries to me lyrics by lionel richie inner smile texas lyrics lionel richie lyrics lionel richie lyrics you mean more to me avril lavigne lyrics for the album under t

his skin avril lavigne lyrics

Experiments – measure

Reciprocal rank: measures how good the system is to return the first relevant answer.

MAP (mean average precision): A precision is computed after each relevant answer is retrieved. Then we average all precision values to get a single number to measure the overall effectiveness.

Experiments – results Our ranking strategy: the

four new factors.

Experiments – results Comparison with related

works

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

Effective Keyword Search in Relational Databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

Conclusions Effectiveness is as

important as efficiency The four new factors are

critical to search effectiveness

Our strategy is significantly more effective than related works

SIGMOD 2006: Effective Keyword Search in Relational Databases

Future Work Utilize link analysis Combine non-text columns Efficiency Problem More real world data sets

SIGMOD 2006: Effective Keyword Search in Relational Databases

Questions ?

SIGMOD 2006: Effective Keyword Search in Relational Databases