Assessing a human mediated current awareness service

Assessing a human mediated

current awareness service

International Symposium of Information Science (ISI 2015)

Zadar, 2015-05-20

Zeljko Carevic1, Thomas Krichel2 and Philipp Mayr1

[email protected]@openlib.org

Outline

1. Introduction

2. RePEc and NEP

3. Results

3.1 Editing time

3.2 Indicators for report success

3.3 Editing effort

4. Conclusion and Outlook

Slide 2 / 31

Motivation

• Thomas Krichel, the founder of

RePEc, visited GESIS – Cologne

in Oct. 2014

• Sharing his Russian souvenir

• ~100 GB of XML log files

Slide 3 / 31

1. Introduction• Current awareness in digital libraries

– To inform users / subscribers about new / relevant acquisitions in their libraries [1].

• Current awareness services allow subscribers to keep up to date with new additions in a certain area of research.

• Selection of relevant documents can be done (semi-)automatically or manually.

• For this work we focus on the intellectual editing process

• Aim of this work:

How do editors work when creating a subject specific report in Digital Libraries (DL)?

Slide 4 / 31

2. Use case: RePEc• RePEc (Research Papers in Economics)

is a DL for working papers in economics research.

• Covers metadata for working papers and journal articles.

• Usually document metadata contains links to full texts

Slide 5 / 31

2. RePEc statistics

0

200

400

600

800

1000

1200

1400

1600

1800

1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Nu

mb

er

of d

ocum

en

ts

Year

Contr. Archives Documents Full text Documents

Regist. Authors Abstract views(April 2015)

~1,700 1.77 mio 1.63 mio ~45,000 >2 mio

Slide 6 / 31

2. Current awareness service NEP

• NEP (New Economics Papers) is a current awareness service for

new additions in RePEc.

• NEP covers subject specific reports from over 90 specific fields.

– Business, Economic and Financial History

– Public Economics

– Social Norms and Social Capital

• Issues are sent to subscribers via E-Mail, RSS and Twitter

• Reports to new additions are generated by subject specific editors.

• Relevant document selection is done manually by the editor!

Slide 7 / 31

Nep-acc Nep-afr

Nep-all

• Contains all new RePEcdocs

• Created roughly on weekly base

• Contains avg. 488 doc

Selects

Nep-upt Nep-ure

Selects Selects Selects

Sends issue Sends issue Sends issue Sends issue

Manual selection of relevant documents is a time consuming task.

Slide 8 / 31

ERNAD

• ERNAD (Editing Reports on New Academic Documents) is a purposed built system

• Re-rank nep-all for each editor based on the specific report topic

• Looking at past issues of a report to produce a ranked nep-all

• If presorting works well editors select highly ranked documents from nep-all

Slide 9 / 31

ERNAD example for Nep-Africa

(NEP-AFR)

1. Tax compliance.. 2. Mental accounting..…212. Ethnic ..in Africa317. Sino-African relations:

Nep-all unsorted Nep-all presorted

Slide 10 / 31

1. Ethnic ..in Africa2. Sino-African relations:…50. Tax compliance.. 51. Mental accounting..

Editing stages

Slide 11 / 31

Research questions

• RQ 1: How long is the editing duration?

• RQ 2: What influences the success of a report?

– Editing duration

– Issue size

• RQ 3: How much effort is invested for selecting and sorting papers per issue?

– Precision @ N

– Relative search length

Slide 12 / 31

RQ 1: Editing time

How much time do editors invest to

create a report?

Slide 13 / 31

Pre-selection

• Editing an issue can be interrupted

• This would distort the results

• Exclude interrupted issues by separating

the edit duration in 3-minute chunks

Slide 14 / 31

Pre-selection

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 >90

Nu

mb

er

of is

su

es

3-minute chunks

Limit edit time < 90 min

Slide 15 / 31

0

10

20

30

40

50

60

nep-ets

nep-gro

nep-opm

nep-pke

nep-cba

nep-hea

nep-rmg

nep-geo

nep-hap

nep-tid

nep-dem

nep-soc

nep-cse

nep-net

nep-ifn

nep-lab

nep-ltv

nep-for

nep-law

nep-mig

nep-cdm

nep-mon

nep-exp

nep-neu

nep-ino

nep-mst

nep-ore

nep-fmk

nep-ara

nep-mkt

Ave

rage

ed

itin

g t

ime

in

min

ute

s

Report

Avg. editing time

RQ 1: Editing time

Avg. 15.5 minutes. (sd = 10.1)

Min. 2.5 minutes NEP-RES (Resource economics)

Max. 53 minutes NEP-ETS (Economic time series)

Slide 16 / 31

Summarize RQ 1

• Average editing time is comparable low

with 15.5 minutes

• Huge scattering between the reports:

–Min. 2.5 minutes

–Max. 53 minutes

Slide 17 / 31

RQ 2: Influences to successful

reports • Popularity of a report can be measured by the number of

subscribers.

• Huge scattering between number of subscribers per report – Max. 6859 NEP-HIS Business, Economic and Financial History

– Min. 75 NEP-CIS Confederation of Independent States

• Factors influencing reports success for example: topic, age of a report..

• Does the issue size or the editing time influence the report success?

Slide 18 / 31

Editing time

0

1000

2000

3000

4000

5000

6000

7000

0 10 20 30 40 50 60

Num

be

r of

sub

scribe

rs

Average editing time

Avg. edit timeAvg. number of subscribers

Education 2198 sub. (avg. 836)

Project, Program and Portfolio Management

43,5 min (avg. 15.5)

Slide 19 / 31

Issue size

0

1000

2000

3000

4000

5000

6000

7000

0 10 20 30 40 50 60

Num

be

r of

sub

scribe

rs

Average issue size

Avg. issue sizeAvg. number of subscribers

Sportsissue size

2.5 (avg. 12.4)

Demographic Economic

issue size 21 (avg. 12.4)

Slide 20 / 31

Summarize RQ 2

• There is no correlation between:

– Issue size and number of subscribers

– Editing time and number of subscribers

• We assume that the success of a report is

mainly driven by topic and age.

Slide 21 / 31

RQ 3: Effort in selecting and

sorting

How much effort is invested in selecting and

sorting relevant documents from nep-all?

Two measures are used:

Precision @N

Relative search length

Slide 22 / 31

Precision @ N

• How many of the top n documents from pre-sorted

nep-all are selected for the issue?

• N set to: 5, 10, 15, 20

• We only consider issues where issue size > N

• A document is relevant if its index position in nep-all

is < N.

Slide 23 / 31

Example: P@ 5

• M={(D1, 4), (D2, 1), (D3, 7), (D4, 3), (D5, 9)}

• P@5 for issue I in report J = ⅗

• Editors vary between using pre-sorted and

un-sorted nep-all. Therefore:

– Only consider issues with pre-sort usage > 50

Slide 24 / 31

Results for P@N

Avg. P@5(82 rep)

Avg. P@10 (64 rep)

Avg. P@15(50rep)

Avg. P@20 (31 rep)

0.77 0.80 0.80 0.82

• Max. found for nep-env (Environmental Economics) with P@5 = 0.99

• Min. found for nep-cba (Central Bank) with P@5 = 0.35

Slide 25 / 31

Summarize P@N

• Editors work comfortably with the

presorting in nep-all.

• The number of papers per issue has no

significant influence for the precision.

Slide 26 / 31

Relative Search Length

• We know how many of the top N

document from nep-all selected.

• To what depth do editors inspect nep-all?

• Ratio between the highest index position

(hin) of the last relevant document in nep-

all and the length of nep-all

Slide 27 / 31

Example RSL

• Editor is given a nep-all containing 300 documents.

• M={(D1, 4), (D2, 10), (D3, 7)}

• RSL = 10/300

• We assume that the editor has inspected nep-all to document 10.

Slide 28 / 31

Relative Search Length

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

nep-mac

nep-demnep-cw

anep-eurnep-iuenep-cbenep-afrnep-m

icnep-becnep-intnep-knmnep-comnep-regnep-ifnnep-cdmnep-tidnep-effnep-inonep-uptnep-edunep-fornep-neunep-cisnep-ltvnep-netnep-devnep-ppmnep-spo

Ave

rag

e R

SL

per

Re

po

rt

Report

Avg. RSL

NEP-MAC (Macroeconomics)

RSL = 0.35

NEP-SPO (Sports and Economics)

RSL = 0.01

Avg. RSL = 0.08

Slide 29 / 31

Summarize RSL

• The relative search length is comparable

low with 0.08

• Editors select papers from the very upper

part of nep-all.

Slide 30 / 31

Conclusion

• Focused on observable system features– Editing time

– Influences on report success

– Effort in creating an issue

• Summarize: The system supports the editor well in creating an issue

• A complete view requires a more user-centred observation.

• Future work:– Why and under what conditions is a document relevant?

• NEP provides many opportunities for further research on data that is relatively easily available.

Slide 31 / 31

Thank you!

Questions?

Technology

Assessing a human mediated current awareness service