Scoring, term weighting and the vector space

Is a document simply a sequence of words?

Many structural components like authors, title, date of publication …..

Metadata – data about documents

Fields – document features where possible values are finite. Example – dates, ISBN

Zones – document features whose content can be arbitrary text fields. Example –title, abstract

A user may specify requirements on fields and zones

One parametric index for each zone/field

Dictionary comes from a fixed vocabulary

Separate inverted index is build for each zone of the document

Dictionary structure whatever vocabulary stems from the text of that zone

Advantages:

Reduced size of the dictionary

Efficient query answering using weighted zone scoring

Different field/zones have different importance in evaluating how a document matches a query

For a query q and a document d, weighted zone scoring assigns a pair to (q, d) [(query, document)] a score in range [0, 1] by computing a linear combination of zone scores

Let each document has l zones. Let g1….gl belongs to [0,1] such that 𝑖=1𝑙 𝑔𝑖 = 1

Each field/zone contributes a Boolean value – let si be the Boolean score denoting a match or absence between q and the ith zone

The weighted zone is 𝑖=1𝑙 𝑔𝑖𝑠𝑖

Consider the query Shakespeare in a collection in which each document has three zones: author, title and body

Boolean score function take the value 1 if the query term Shakespeare is present in the zone otherwise 0

Weight score term require three weights gbody, gtitle and gauthor

Let gbody=0.5, gtitle=0.3 and gauthor=0.2

Here author zone is least important, title zone is somewhat more and body contributes the most

Could have been specified by an expert

Can be judged editorially

Each training example is a tuple consisting of a query q and a document d and a relevance judgment of q on d

The judgment can be binary

A judgment score can also be used

Compute the weights such that the learned scores approximate the relevance judgments as much as possible

An optimization problem

Consider two zones: title and body

Compute Boolean variables sT(d, q), sB(d, q) depending on the query matching

Compute a score between 0 and 1 by using the relation:

Score (d, q) = g sT(d, q) + (1-g)sB(d, q)

Constant g is determined from a set of training examples µj = (dj, qj, r(dj, qj))

In each training example, each training document and a training query is accessed by a human editor who delivers a relevant judgment r(dj, qj).

For each training example µj ,we have Boolean values sT(d, q) and sB(d, q), that we use to compute a score

Error of scoring function

µ(g, µj) = (r(dj, qj) – score(dj, qj))2

Total error 𝑗 µ(g, µj)

Let n01r (n01n) be the numbers of training examples that STitle = 0 and SBody = 1 and the judgment is relevant (irrelevant). The contribution of those examples that STitle = 0 and SBody = 1 to the total error is

[1-(1-g)]2n01r + [0-(1-g)]2n01n

The total error is (n01r+n10n)g2 + (n10r + n01n)(1-g)2 +n00r + n11n

By differentiating with respect to g and setting the result to 0, the optimal value of g is

𝒏𝟏𝟎𝒓+𝒏

𝟎𝟏𝒏

𝒏𝟏𝟎𝒓+𝒏

𝟏𝟎𝒏+𝒏

𝟎𝟏𝒓+𝒏

𝟎𝟏𝒏

Scoring, term weighting and the vector space

Data & Analytics

Information Retrieval and Web Mining Lecture 6. This lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index

CS276A Information Retrieval Lecture 7. Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index

4. Term Weighting and Vector Space Model › media › Einrichtungen › dws › ... · 2020-03-09 · Term weighting (TF-IDF) ... Beyond Boolean retrieval So far, all our queries

Orthogonal support vector machine for credit scoring · The most commonly used techniques for credit scoring is logistic regression, ... divided into sovereign ... score that is obtained

Information Retrieval Lecture 6. Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support

Scoring, Term Weighting, and Vector Space Model Lecture 7: Scoring, Term Weighting and the Vector Space Model Web Search and Mining 1

Vector Weighting Approach and Vector Space Decoupling

Assessment Scoring and Weighting Decision Guide

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 6: Scoring, Term Weighting and the Vector Space Model

Interference Situation Adaptation Scheme for Organized ... · Interference Situation Adaptation Scheme for Organized ... weighting vector) ... generally idea of the proposed interference

Introduction to Information Retrievaljg66/teaching/4315/notes/... · 2019-04-11 · Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting

Government of the Islamic Republic of Afghanistan National ... · Table 6: Technology option scoring justification table (Waste Sector) 26 Table 7: Weighting of criteria showing assigned

Hinrich Schütze and Christina Lioma Lecture 6: Scoring, Term Weighting, The Vector Space Model

Scoring Introduction 2019… · used to calculate the final score, according the scoring category weighting (see section ‘Scoring categories and weightings’). A minimum score

Lecture 4: Term Weighting and the Vector Space IR System

Media Monitoring and Evaluation and Related Services · Framework RM3708 Media Monitoring Lot 1 / Lot 2. 2) Evaluation criteria – i.e. the price / quality weighting, scoring criteria

MCA Weighting

vrml search large.ppt [Mode de compatibilité] - … · +tf-idf weighting sparse frequency vector gg Vector compression Vector search Re ranked Gtiranked imageranked image short-list

Introduction to Information Retrieval CSE 538 MRS BOOK – CHAPTER VI SCORING, TERM WEIGHTING AND THE VECTOR SPACE MODEL 1

SCORING, TERM WEIGHTING AND THE VECTOR SPACE MODEL