14
Modern Information Retrieval Chapter 2 Modeling

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

Modern Information Retrieval

Chapter 2 Modeling

Page 2: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

Can keywords be used to represent a document or a query? keywords as query and matching as

query processing cannot generate good results, in general

ranking algorithm, document relevance and IR model

Page 3: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

Taxonomy of IR models

Page 4: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

Ad hoc and filtering retrieval ad hoc retrieval: static document

collection, queries submitted filtering retrieval: static queries,

document streaming user profile describes user’s preference keywords, relevance feedback and

dynamic keywords adjustment

Page 5: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

Formal characterization of IR models

Page 6: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

Classic IR Index terms

deciding on the importance of a term is difficult

consider a term’s semantics as well as its distribution in all documents

weight’s are used to quantify the importance of the index terms for describing the document contents

Page 7: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

mutual independence assumption simplifies the task of fast ranking computation

Page 8: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

Boolean model index term weights are binary query as a Boolean expression

not, and, or as connectives Users might find it difficult to specify their

information needs

Page 9: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

advantages and disadvantages each document is either relevant or non-

relevant given = (0,1,0), is document dj an answer?

Page 10: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

Vector model Allows partial matching and ranking

by a similarity measure

Page 11: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing
Page 12: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

Computing index term weights term frequency, tf factor: how well the

term describes the document contents inverse document frequency, idf factor:

how well the term represents the document

Page 13: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing
Page 14: Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing

the vector model is a popular retrieval model due to its simplicity and performance