Upload
madhu-siva
View
226
Download
0
Embed Size (px)
Citation preview
By S.DHIVYA(2014614007)
M.E CSE(I YEAR)
04/19/23
1
Time and space The shorter the response time, the smaller the space used, the
better the system is Performance evaluation (for data retrieval)
Performance of the indexing structure The interaction with the operating systems The delays in communication channels The overheads introduced by software layers
Performance evaluation (for information retrieval) Besides time and space, retrieval performance is an issue
04/19/23
2
Factors of Retrieval performance evaluation
1) Test reference collection * collection of documents * set of example information requests * set of relevant documents(provided by
specialist) for each example information request
04/19/23
3
2)Evaluation measure-Goodness of retrieval strategy
- Similarity between set of documents retrieved by S and set of relevant documents
provided by specialists
04/19/23
4
Retrieval evaluation measures 1) RECALL 2) PRECISION• Alternative 1) E-Measure 2)Harmonic mean 3) Satisfaction 4) Frustration
04/19/23
5
Retrieval task-query processingBatch mode
The user submits a query and receives an answer back
How the answer set is generated-only quality of answer is generated.
Interactive mode The user specifies his information need through
a series of interactive steps with the system Aspects
User effort characteristics of interface design guidance provided by the system duration of the session
04/19/23
6
Recall the fraction of the relevant documents which
has been retrieved Precision
the fraction of the retrieved documents which is relevant
04/19/23
7
||
||
R
Ra
||
||
A
Ra
collection
Relevant Docs|R|
Answer Set|A|
Relevant Docsin Answer Set
|Ra|
The user is not usually presented with all the documents in the answer set A at once
ExampleRq={d3,d5,d9,d25,d39,d44,d56,d71,d89,d123}
Ranking for query q by a retrieval algorithm1. d123 6. d9 11. d382. d84 7. d511 12. d483. d56 8. d129 13. d2504. d6 9. d187 14. d1135. d8 10. d25 15. d3
04/19/238
(precision, recall) (100%,10%)
(66%,20%) (50%,30%) (40%,40%) (33%,50%)
precision versus recall based on 11 standard recall levels: 0%, 10%, 20%, …, 100%
04/19/239
0 20 40 60 80 100 120
2040
6080100120
interpolation
recall
precision
average the precision figures at each recall level
P(r): the average precision at the recall level r
Nq: the number of queries used Pi(r): the precision at recall level r for the i-th
query
04/19/2310
Nq
i
iNq
rPrP
1
)()(
Rq={d3,d56,d129}1. d123 6. d9 11. d382. d84 7. d511 12. d483. d56 8. d129 13. d2504. d6 9. d187 14. d1135. d8 10. d25 15. d3
04/19/2311
(33.3%,33.3%) (25%,66.6%)(20%,100%)
(precision, recall)
rj (j {0,1,2,…,10}): a reference to the j-th standard recall level (e.g., r5 references to the recall level 50%)
P(rj)=max rjrrj+1P(r) Example
04/19/2312
interpolated precision
d56 (33.3%,33.3%)d129 (25%,66.6%)d3 (20%,100%)
r0: (33.33%,0%) r1: (33.33%,10%) r2: (33.33%,20%)r3: (33.33%,30%) r4: (25%,40%) r5: (25%,50%)r6: (25%,60%) r7: (20%,70%) r8: (20%,80%)r9: (20%,90%) r10: (20%,100%)
The curve of precision versus recall which results from averaging the results for various queries
04/19/2313
0 20 40 60 80 100 120
10
100
20
90
30
80
40
70
5060
recall
precision
Average precision at seen relevant documentsGenerate a single value summary of the
ranking by averaging the precision figures obtained after each new relevant document is observed
Example1. d123 (1)6. d9 (0.5) 11. d382. d84 7. d511 12. d483. d56 (0.66) 8. d129 13. d2504. d6 9. d187 14. d1135. d8 10. d25 (0.4)15. d3 (0.33)
04/19/2315
(1+0.66+0.5+0.4+0.33)/5=0.57
Favor systems which retrieve relevant documents quickly
Reciprocal Rank (RR)Equals to precision at the 1st retrieved
relevant documentUseful for tasks need only 1 relevant
documentex: Question & Answering
Mean Reciprocal Rank (MRR)The mean of RR over several queries
04/19/23
15
R-PrecisionGenerate a single value summary of
ranking by computing the precision at the R-th position in the ranking, where R is the total number of relevant documents for the current query
04/19/2316
1. d123 6. d9 2. d84 7. d5113. d56 8. d1294. d6 9. d1875. d8 10. d25 R=10 and # relevant=4R-precision=4/10=0.4
2. 1. d1232. d84
3. 56
R=3 and # relevant=1R-precision=1/3=0.33
Precision HistogramsA R-precision graph for several queriesCompare the retrieval history of two
algorithms
RPA/B=0: both algorithms have equivalent performance for the i-the query
RPA/B>0: A has better retrieval performance for query i
RPA/B<0: B has better retrieval performance for query i
04/19/2317
querythitheforBandAorithmsaretrieval
ofvaluesprecisionRareiRPandiRPwhere
iRPiRPiRP
BA
BABA
lg
)()(
)()()(/
04/19/23
18
0.0
0.5
1.0
1.5
-0.5
-1.0
-1.5
1 2 3 4 5 6 7 8 9 10
Query Number
2
8
Statistical summary regarding the set of all the queries in a retrieval taskthe number of queries used in the taskthe total number of documents retrieved
by all queriesthe total number of relevant documents
which were effectively retrieved when all queries are considered
the total number of relevant documents which could have been retrieved by all queries
04/19/2319
Estimation of maximal recall requires knowledge of all the documents in the collection
Recall and precision capture different aspects of the set of retrieved documents
Recall and precision measure the effectiveness over queries in batch mode
Recall and precision are defined under the enforcement of linear ordering of the retrieved documents
04/19/2320
harmonic mean F(j) of recall and precision
R(j): the recall for the j-th document in the ranking
P(j): the precision for the j-th document in the ranking
04/19/2321
)(1
)(1
2)(
jPjR
jF
RP
RPF
2
04/19/23 22
1. d123 6. d9 11. d382. d84 7. d511 12. d483. d56 8. d129 13. d2504. d6 9. d187 14. d1135. d8 10. d25 15. d3
(33.3%,33.3%) (25%,66.6%) (20%,100%)
36.0
67.01
25.01
2)8(
F33.0
33.01
33.01
2)3(
F 33.0
11
20.01
2)15(
F
E evaluation measureAllow the user to specify whether he is more
interested in recall or precision
04/19/2323
)(1
)(
11)(
2
2
jPjRb
bjE
FP R
P R
( )
2
2
1
Basic assumption of previous evaluationThe set of relevant documents for a query is
the same, independent of the user User-oriented measures
coverage rationovelty ratio relative recall recall effort
04/19/23
24
04/19/2326
Relevant Docs |R|Answer Set |A|
Relevant Docsknown to the User
which were retrieved |Rk|
Relevant Docspreviously unknown to theuser which were retrieved |Ru|
Relevant Docsknown to the user |U|
||
||cov
U
Rerage k
||||
||
ku
uRR
Rnovelty
high coverage ratio: system finds most of the relevant documents the user expected to see
high novelty ratio: the system reveals many new relevant documents which were previously unknown
(proposed by system)relative recall=
||
||||
U
RuRk recall effort:# of relevant docsthe user expectedto find/# of docsexamined to findthe expectedrelevant docs