View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Modern Information Retrieval
Chapter 2 Modeling
Can keywords be used to represent a document or a query? keywords as query and matching as
query processing cannot generate good results, in general
ranking algorithm, document relevance and IR model
Taxonomy of IR models
Ad hoc and filtering retrieval ad hoc retrieval: static document
collection, queries submitted filtering retrieval: static queries,
document streaming user profile describes user’s preference keywords, relevance feedback and
dynamic keywords adjustment
Formal characterization of IR models
Classic IR Index terms
deciding on the importance of a term is difficult
consider a term’s semantics as well as its distribution in all documents
weight’s are used to quantify the importance of the index terms for describing the document contents
mutual independence assumption simplifies the task of fast ranking computation
Boolean model index term weights are binary query as a Boolean expression
not, and, or as connectives Users might find it difficult to specify their
information needs
advantages and disadvantages each document is either relevant or non-
relevant given = (0,1,0), is document dj an answer?
Vector model Allows partial matching and ranking
by a similarity measure
Computing index term weights term frequency, tf factor: how well the
term describes the document contents inverse document frequency, idf factor:
how well the term represents the document
the vector model is a popular retrieval model due to its simplicity and performance