Retrieval Evaluation

By S.DHIVYA(2014614007)

M.E CSE(I YEAR)

04/19/23

1

Time and space The shorter the response time, the smaller the space used, the

better the system is Performance evaluation (for data retrieval)

Performance of the indexing structure The interaction with the operating systems The delays in communication channels The overheads introduced by software layers

Performance evaluation (for information retrieval) Besides time and space, retrieval performance is an issue

04/19/23

2

Factors of Retrieval performance evaluation

1) Test reference collection * collection of documents * set of example information requests * set of relevant documents(provided by

specialist) for each example information request

04/19/23

3

2)Evaluation measure-Goodness of retrieval strategy

- Similarity between set of documents retrieved by S and set of relevant documents

provided by specialists

04/19/23

4

Retrieval evaluation measures 1) RECALL 2) PRECISION• Alternative 1) E-Measure 2)Harmonic mean 3) Satisfaction 4) Frustration

04/19/23

5

Retrieval task-query processingBatch mode

The user submits a query and receives an answer back

How the answer set is generated-only quality of answer is generated.

Interactive mode The user specifies his information need through

a series of interactive steps with the system Aspects

User effort characteristics of interface design guidance provided by the system duration of the session

04/19/23

6

Recall the fraction of the relevant documents which

has been retrieved Precision

the fraction of the retrieved documents which is relevant

04/19/23

7

||

||

R

Ra

||

||

A

Ra

collection

Relevant Docs|R|

Answer Set|A|

Relevant Docsin Answer Set

|Ra|

The user is not usually presented with all the documents in the answer set A at once

ExampleRq={d3,d5,d9,d25,d39,d44,d56,d71,d89,d123}

Ranking for query q by a retrieval algorithm1. d123 6. d9 11. d382. d84 7. d511 12. d483. d56 8. d129 13. d2504. d6 9. d187 14. d1135. d8 10. d25 15. d3

04/19/238

(precision, recall) (100%,10%)

(66%,20%) (50%,30%) (40%,40%) (33%,50%)

precision versus recall based on 11 standard recall levels: 0%, 10%, 20%, …, 100%

04/19/239

0 20 40 60 80 100 120

2040

6080100120

interpolation

recall

precision

average the precision figures at each recall level

P(r): the average precision at the recall level r

Nq: the number of queries used Pi(r): the precision at recall level r for the i-th

query

04/19/2310

Nq

i

iNq

rPrP

1

)()(

Rq={d3,d56,d129}1. d123 6. d9 11. d382. d84 7. d511 12. d483. d56 8. d129 13. d2504. d6 9. d187 14. d1135. d8 10. d25 15. d3

04/19/2311

(33.3%,33.3%) (25%,66.6%)(20%,100%)

(precision, recall)

rj (j {0,1,2,…,10}): a reference to the j-th standard recall level (e.g., r5 references to the recall level 50%)

P(rj)=max rjrrj+1P(r) Example

04/19/2312

interpolated precision

d56 (33.3%,33.3%)d129 (25%,66.6%)d3 (20%,100%)

r0: (33.33%,0%) r1: (33.33%,10%) r2: (33.33%,20%)r3: (33.33%,30%) r4: (25%,40%) r5: (25%,50%)r6: (25%,60%) r7: (20%,70%) r8: (20%,80%)r9: (20%,90%) r10: (20%,100%)

The curve of precision versus recall which results from averaging the results for various queries

04/19/2313

0 20 40 60 80 100 120

10

100

20

90

30

80

40

70

5060

recall

precision

Average precision at seen relevant documentsGenerate a single value summary of the

ranking by averaging the precision figures obtained after each new relevant document is observed

Example1. d123 (1)6. d9 (0.5) 11. d382. d84 7. d511 12. d483. d56 (0.66) 8. d129 13. d2504. d6 9. d187 14. d1135. d8 10. d25 (0.4)15. d3 (0.33)

04/19/2315

(1+0.66+0.5+0.4+0.33)/5=0.57

Favor systems which retrieve relevant documents quickly

Reciprocal Rank (RR)Equals to precision at the 1st retrieved

relevant documentUseful for tasks need only 1 relevant

documentex: Question & Answering

Mean Reciprocal Rank (MRR)The mean of RR over several queries

04/19/23

15

R-PrecisionGenerate a single value summary of

ranking by computing the precision at the R-th position in the ranking, where R is the total number of relevant documents for the current query

04/19/2316

1. d123 6. d9 2. d84 7. d5113. d56 8. d1294. d6 9. d1875. d8 10. d25 R=10 and # relevant=4R-precision=4/10=0.4

2. 1. d1232. d84

3. 56

R=3 and # relevant=1R-precision=1/3=0.33

Precision HistogramsA R-precision graph for several queriesCompare the retrieval history of two

algorithms

RPA/B=0: both algorithms have equivalent performance for the i-the query

RPA/B>0: A has better retrieval performance for query i

RPA/B<0: B has better retrieval performance for query i

04/19/2317

querythitheforBandAorithmsaretrieval

ofvaluesprecisionRareiRPandiRPwhere

iRPiRPiRP

BA

BABA

lg

)()(

)()()(/

04/19/23

18

0.0

0.5

1.0

1.5

-0.5

-1.0

-1.5

1 2 3 4 5 6 7 8 9 10

Query Number

2

8

Statistical summary regarding the set of all the queries in a retrieval taskthe number of queries used in the taskthe total number of documents retrieved

by all queriesthe total number of relevant documents

which were effectively retrieved when all queries are considered

the total number of relevant documents which could have been retrieved by all queries

04/19/2319

Estimation of maximal recall requires knowledge of all the documents in the collection

Recall and precision capture different aspects of the set of retrieved documents

Recall and precision measure the effectiveness over queries in batch mode

Recall and precision are defined under the enforcement of linear ordering of the retrieved documents

04/19/2320

harmonic mean F(j) of recall and precision

R(j): the recall for the j-th document in the ranking

P(j): the precision for the j-th document in the ranking

04/19/2321

)(1

)(1

2)(

jPjR

jF

RP

RPF

2

04/19/23 22

1. d123 6. d9 11. d382. d84 7. d511 12. d483. d56 8. d129 13. d2504. d6 9. d187 14. d1135. d8 10. d25 15. d3

(33.3%,33.3%) (25%,66.6%) (20%,100%)

36.0

67.01

25.01

2)8(

F33.0

33.01

33.01

2)3(

F 33.0

11

20.01

2)15(

F

E evaluation measureAllow the user to specify whether he is more

interested in recall or precision

04/19/2323

)(1

)(

11)(

2

2

jPjRb

bjE

FP R

P R

( )

2

2

1

Basic assumption of previous evaluationThe set of relevant documents for a query is

the same, independent of the user User-oriented measures

coverage rationovelty ratio relative recall recall effort

04/19/23

24

04/19/2326

Relevant Docs |R|Answer Set |A|

Relevant Docsknown to the User

which were retrieved |Rk|

Relevant Docspreviously unknown to theuser which were retrieved |Ru|

Relevant Docsknown to the user |U|

||

||cov

U

Rerage k

||||

||

ku

uRR

Rnovelty

high coverage ratio: system finds most of the relevant documents the user expected to see

high novelty ratio: the system reveals many new relevant documents which were previously unknown

(proposed by system)relative recall=

||

||||

U

RuRk recall effort:# of relevant docsthe user expectedto find/# of docsexamined to findthe expectedrelevant docs

Documents

Retrieval Evaluation