Effective Keyword Search in Relational Databases
Fang Liu (University of Illinois at Chicago)Clement Yu (University of Illinois at Chicago)Weiyi Meng (Binghamton University)Abdur Chowdhury (America Online, Inc.)
Effective Keyword Search in Relational Databases
Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction
Why keyword search in relational databases? We want to search text data in
relational databases SQL with the “contains” operator is not
for non-expert users Keyword search is tremendous
successful in text database by ranking documents based on similarity. It is for non-expert users
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction Text data in relational databases
SIGMOD 2006: Effective Keyword Search in Relational Databases
IntroductionSuppose a user is looking for albums titled “off the wall”
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction Keyword search is very
successful in text database by ranking documents based on similarity. Google, Yahoo and MSN search are the examples.
So, let’s do keyword search in relational databases!(DBXplorer, BANKS, DISCOVER & IR-style DISCOVER, ObjectRank, Ranking Objects)
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction Let’s do it, but how?
What are answers to be ranked? How should we rank these answers?
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction -- an answer
An answer for a given query Q: a tuple tree, in which every leaf node must have at least one keyword in Q.SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction Use a slightly modified
algorithm [DISCOVER] to produce all answers for a given query.
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction: Ranking Our focus is on the effectiveness
problem of ranking answers: the more relevant an answer is to the user query, the higher it should be ranked.
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction: Contributions We identify four new factors that are
critical to effective ranking and we propose a new ranking strategy
Design and conduct comprehensive experiments for the effectiveness problem
Experimental results show our strategy is significantly better than existing works in effectiveness
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work
Effective Keyword Search in Relational Databases
SIGMOD 2006: Effective Keyword Search in Relational Databases
3.3 IR Ranking Q=(k1, k2, ..,kn), D is a document, Sim(Q,D) is
the ranking score of D.
DQk
DkweightQkweightDQSim,
),(*),(),(
idfndl
ntfDkweight ),(
))ln(1ln(1 tfntf
1ln
df
Nidf
avgdldl
ssndl )1(
tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2
1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8
tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2
1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work
Effective Keyword Search in Relational Databases
SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy T=(D1,D2,..Dn), so Si
m(Q,D)Sim(Q,T)
DQk
DkweightQkweightDQSim,
),(*),(),(
TQk
TkweightQkweightTQSim,
),(*),(),(
SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy
TQk
TkweightQkweightTQSim,
),(*),(),(
T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T)
)(*
*),(
TNsizendl
idfntfDkweight
g
i
),(),...,,()( 1, mDkweightDkweightCombTkweight
SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy Tuple Tree Size Normalization
avgsize
TsizessTNsize
)()1()(
)(*
*),(
TNsizendl
idfntfDkweight
g
i
# of tuples in a tuple tree T
SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy Document Length Normalization
Reconsidered
)(*
*),(
TNsizendl
idfntfDkweight
g
i
)ln(1*)1( avgdlavgdl
dlssndl
Document length of Di
Average Document length of the text column of Di
SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy Document Frequency
Normalization
)(*
*),(
TNsizendl
idfntfDkweight
g
i
1ln
g
gg
df
Nidf
SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy T=(D1,D2,..Dn)
maxWgt is the maximum weight(k, Di) sumWgt is the sum of weight(k, Di)
),(),...,,()( 1, mDkweightDkweightCombTkweight
Wgt
sumWgtWgtComb
maxln1ln1*max()
SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T)
),(),...,,()( 1, mDkweightDkweightCombTkweight
)(*
*),(
TNsizendl
idfntfDkweight
g
i
idfndl
ntfDkweight ),(
SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy Schema Terms in Query
lyrics for How come by D12 lusher the singer's lyrics to burn
Phrase-based Ranking Using position information to boast phrase matching
Concept-based Ranking Can improve effectiveness Can assign semantics to answers
SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work
Effective Keyword Search in Relational Databases
SIGMOD 2006: Effective Keyword Search in Relational Databases
Experiments – data set A Lyrics Database
50 Queries from an AOL query log Relevance Judgment: pooling + logs
Experiments: some queries to me lyrics by lionel richie inner smile texas lyrics lionel richie lyrics lionel richie lyrics you mean more to me avril lavigne lyrics for the album under t
his skin avril lavigne lyrics
Experiments – measure
Reciprocal rank: measures how good the system is to return the first relevant answer.
MAP (mean average precision): A precision is computed after each relevant answer is retrieved. Then we average all precision values to get a single number to measure the overall effectiveness.
Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work
Effective Keyword Search in Relational Databases
SIGMOD 2006: Effective Keyword Search in Relational Databases
Conclusions Effectiveness is as
important as efficiency The four new factors are
critical to search effectiveness
Our strategy is significantly more effective than related works
SIGMOD 2006: Effective Keyword Search in Relational Databases
Future Work Utilize link analysis Combine non-text columns Efficiency Problem More real world data sets
SIGMOD 2006: Effective Keyword Search in Relational Databases