32
Indiana University Berkeley, CA - 2011 School of Informatics and Computing Attribution and Credit: beyond print & citations Johan Bollen Indiana University School of Informatics and Computing Center for Complex Networks and System Research [email protected] Acknowledgements: Herbert Van de Sompel (LANL), Marko A. Rodriguez (LANL), Ryan Chute (LANL), Lyudmila L. Balakireva (LANL), Aric Hagberg (LANL), Luis Bettencourt (LANL) Research supported by the NSF and the Andrew W. Mellon Foundation.

Attribution and Credit: beyond print & citations Johan Bollen Indiana University

  • Upload
    vin

  • View
    12

  • Download
    0

Embed Size (px)

DESCRIPTION

Attribution and Credit: beyond print & citations Johan Bollen Indiana University School of Informatics and Computing Center for Complex Networks and System Research jbollen@ indiana.edu Acknowledgements: Herbert Van de Sompel (LANL), Marko A. Rodriguez (LANL), Ryan Chute (LANL), - PowerPoint PPT Presentation

Citation preview

Page 1: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Attribution and Credit: beyond print & citations

Johan BollenIndiana University

School of Informatics and Computing

Center for Complex Networks and System Research

[email protected]

Acknowledgements:

Herbert Van de Sompel (LANL), Marko A. Rodriguez (LANL), Ryan Chute (LANL),

Lyudmila L. Balakireva (LANL), Aric Hagberg (LANL), Luis Bettencourt (LANL)

Research supported by the NSF and the Andrew W. Mellon Foundation.

Page 2: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Impact evaluation from citation data.

IF is part of Journal Citation Reports (JCR)

JCR = citation graph

2005 journal citation network• +- 8,560 journals• 6,370,234 weighted citation edges

Impact Factor = mean 2 year citation rate

Lowinfluence

Highinfluence

20032001 2002

Journal x All (2003)

Citation data:• Golden standard of scholarly evaluation• Citation = scholarly influences.• Extracted from published materials.• Main bibliometric data source for scholarly

evaluation.

Page 3: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Clearly something’s wrong

Digital era: digital scholars

Technology has changed

Scholarly communication has changed

Habits and minds have changed

Scholars have changed

But scholarly review and assessment?Still stuck in the paper era:Peer review, print, citations,

journals…Party like it’s 1899!

Page 4: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

The scientific process: the importance of early indicators

(Egghe & Rousseau, 2000; Wouters, 1997)(Brody, Harnad, & Carr 2006),

Citation: final products

• Publication delays

• Focus on publications

• Focus on authors

Usage data

• Scale, cf. Elsevier downloads (+1B) vs. Wos citations (650M)

• Immediate, early stages

• Variety of resources and actors

Page 5: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

The Promise of Usage Data

Page 6: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Challenges

1. Representativeness:• Recorded for particular service• Recorded for particular user community• Challenge: large-scale aggregation of representative data

2. Attribution and credit:• Citation is explicit, intentional expression of influence• Usage is behavioral, implicit measurement of “attention”• Challenge: turn clickstreams into attribution, credit, and influence

3. Community acceptance:• Unproven in terms of metrics, services, etc.• Lack of applications and community services• Challenge: creation of framework for community services

This presentation: overview of our work in (1), (2) and (3)

Page 7: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Presentation structure

1. MESUR’s work on usage data aggregation2. Turning clickstreams into credit and

attribution3. Community-services4. Discussion

Page 8: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Timeline and development

• 2006-2010 (Los Alamos National Laboratory and Indiana University):o MESUR project: funded by Andrew W. Mellon Foundationo Models of scholarly communicationo Large-scale aggregation of usage datao Large-scale survey of usage-based metrics

• 2010-present (Indiana University)o MESUR 2.0: funded by NSF and Andrew W. Mellon Foundationo Work on development of community-supported, open and sustainable

framework with NISO (Todd Carpenter)

Page 9: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

MESUR data flow

http://www.mesur.org/schemas/2007- 01/mesur/

Page 10: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Modeling the scholarly communication process:the MESUR ontology.

Based on OntologyX5 frameworkdeveloped by Rights.com

Examples:An author (Agent) publishes

(Context:Event) an article (Document)

A user (Agent) uses (Context:Event) a journal (Document)

Marko. A Rodrguez, Johan Bollen and Herbert Van de Sompel. A Practical On- tology for the Large-Scale Modeling of Scholarly Artifacts and their Usage, In Proceedings of the Joint Conference on Digital Libraries 2007, Vancouver, June 2007

Page 11: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

MESUR usage DB

2006-2010: Collaborating publishers, aggregators and institutional consortia:

• BMC, Blackwell, UC, CSU (23), EBSCO, ELSEVIER, EMERALD, INGENTA, JSTOR, LANL, MIMAS/ZETOC, THOMSON, UPENN (9), UTEXAS

• Scale:o > 1,000,000,000 usage events, and growing…o +50M articles, +-100,000 serials

• Period: 2002-2010…

Page 12: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Presentation structure

1. MESUR’s work on usage data aggregation2. Turning clickstreams into credit and

attribution3. Community-services4. Discussion

Page 13: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

From usage to clickstreams

Minimal requirements for all usage data

• Unique usage events (article level)

• Fields: unique session ID, date/time, unique document ID and/or metadata, request type

• Note difference with usage statistics

Page 14: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

How to generate a usage network.

Same session ~ documents relatedness• Same session, same user: common interest• Frequency of co-occurrence = degree of

relationship• Normalized: conditional probability (P(A|B) !=

P(B|A), directional• Clickstream, breadcrumbs, paths, flow…

Usage data is on article level:• Works for journals and articles• In fact, anything for which usage was

recorded

Note: not something we invented: association rule learning in data mining.

Beer and diapers!

Page 15: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Johan Bollen, Herbert Van de Sompel, Aric Hagberg,Luis Bettencourt, Ryan Chute, Marko A. Rodriguez, Lyudmila Balakireva. Clickstream data yields high-resolution maps of science. PLoS One, February 2009.

Page 16: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Network science for impact metrics.

: Number of geodesics between vi and vj

Betweenness centrality

PR(vi): PageRank of node vi

O(vj): out-degree of journal vj

N: number of nodes in networkL: dampening factor

PageRank

Page 17: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Set of metrics calculated on MESUR data set

Page 18: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

The MESUR Metrics Map

RATE METRICS

TOTAL CITES

PAGERANK(S)

BETWEENNESS

USAGE METRICS

Johan Bollen, Herbert Van de Sompel, Aric Hagberg and Ryan Chute. A Principal Component Analysis of 39 Scientific Impact Measures. PLoS ONE, June 2009. URL: http://dx.plos.org/10.1371/journal.pone.0006022.

Page 19: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Presentation structure

1. MESUR’s work on usage data aggregation2. Turning clickstreams into credit and

attribution3. Community-services4. Discussion

Page 20: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

MESUR Mapping and ranking services

Page 21: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

MESUR Mapping and ranking services

Page 22: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

MESUR Mapping and ranking services

Page 23: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Presentation structure

1. MESUR’s work on usage data aggregation2. Turning clickstreams into credit and

attribution3. Community-services4. Discussion

Page 24: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Registration is now open for "Scholarly Evaluation Metrics: Opportunities and Challenges", a one-day NSF-funded workshop that will take place in the Renaissance Washington DC Hotel on Wednesday, December 16th 2009. Participation in this workshop is limited to 50 people. Registration is free at http://informatics.indiana.edu/scholmet09/registration.html. 

The topic of the workshop is the future of scholarly assessment approaches, including organizational, infrastructural, and community issues. The overall goal is to identify requirements for novel assessment approaches, several of which have been proposed in recent years, to become acceptable to community stakeholders including scholars, academic and research institutions, and funding agencies. The impressive group of speakers and panelists for the workshop includes representatives from each of these constituencies.

Further details are available at http://informatics.indiana.edu/scholmet09/announcement.html

Workshop organizers:Johan Bollen ([email protected]),Herbert Van de Sompel ([email protected]) andYing Ding ([email protected])

Seeking increasing community involvement

Page 25: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Planning process underway to establish sustainable, open, community supported infrastructure.New support from Andrew W. Mellon foundation to figure it all out.Close collaboration with NISO (Todd Carpenter)

Moving towards community involvement

Logistics:Data aggregationNormalizationData-related servicesData management

Science:MetricsAnalysisPrediction

ServicesRankingAssessmentMapping

=More than sum of parts:• Each component supports the other• Various business and funding models• Generate added value on all levels

Can fundamentally change scholarly communication

Page 26: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

- Microsoft/MSR: http://academic.research.microsoft.com/- Altmetrics: http://altmetrics.org/- Mendeley-based analytics- Publisher-driven initiatives: Elsevier’s SciVal, mapping of science- Google Scholar- Katy Borner at Indiana University: Science of Science Cyberinfrastructure

New, interesting initiatives

Page 27: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Some relevant publications.

Johan Bollen, Herbert Van de Sompel, Aric Hagberg, Luis Bettencourt, Ryan Chute, Marko A. Rodriguez, Lyudmila Balakireva. Clickstream data yields high-resolution maps of science. PLoS One, March 2009 (In Press)

Johan Bollen, Herbert Van de Sompel, Aric HagBerg, Ryan Chute. A principal component analysis of 39 scientific impact measures. arXiv.org/abs/0902.2183

Johan Bollen, Marko A. Rodriguez, and Herbert Van de Sompel. Journal status. Scientometrics, 69(3), December 2006 (arxiv.org:cs.DL/0601030)

Johan Bollen, Herbert Van de Sompel, and Marko A. Rodriguez. Towards usage-based impact metrics: first results from the MESUR project. In Proceedings of the Joint Conference on Digital Libraries, Pittsburgh, June 2008

Marko A. Rodriguez, Johan Bollen and Herbert Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage, In Proceedings of the Joint Conference on Digital Libraries, Vancouver, June 2007

Johan Bollen and Herbert Van de Sompel. Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics. (cs.DL/0610154)

Johan Bollen and Herbert Van de Sompel. An architecture for the aggregation and analysis of scholarly usage data. In Joint Conference on Digital Libraries (JCDL2006), pages 298-307, June 2006.

Johan Bollen and Herbert Van de Sompel. Mapping the structure of science through usage. Scientometrics, 69(2), 2006.

Johan Bollen, Herbert Van de Sompel, Joan Smith, and Rick Luce. Toward alternative metrics of journal impact: a comparison of download and citation data. Information Processing and Management, 41(6):1419-1440, 2005.

Page 28: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Local aggregation of usage data: linking servers

Full textDBs

Full textDBs

EhostEJS

EhostEJS

GoogleGoogle

ISIISI

BritishLibraryBritishLibrary

A&Iservice

A&Iservice

Publishersites

Publishersites

ingentaingenta

Link Resolver

• Linking servers can record activities across multiple OpenURL-enabled information sources of a specific digital library environment

• Linking server logs are representative of the activities of a particular user population

• Global scholarly information space compliant with linking servers

• Allows recording of clickstream data: other methods of log aggregation can not connect “same user, different system” streams

Page 29: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Global aggregation of usage data

Aggregatedlogs

Log DB

Usage logs

LogRepository 1

Link Resolver

LogRepository 2

Link Resolver

LogRepository 3

Link Resolver

AggregatedUsage Data

Usage logs

Usage logs

• Aggregation of linking server logs leads to data set representative of large sample of scholarly community

• Global really means different samples of scholarly community• Can be finetuned

for local communities

• Possibility of truly global coverage

Page 30: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

bX project: standards-based aggregation of usage data

Aggregatedlogs

Log DB

OpenURLContextObjects

LogRepository 1

Link Resolver

LogRepository 2

Link Resolver

LogRepository 3

Link Resolver

Logharvester

COCOCO

COCOCO

COCOCO

Usage log aggregation via OAI-PMH

Log Repository properties:

• OAI-PMH metadata record: • linking server event log for specific document in specific session• expressed using OpenURL XML ContextObject Format

• OAI-PMH identifier: UUID for event • OAI-PMH datestamp: datetime the event was added to the Log Repository

AggregatedUsage Data

Page 31: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

bX project: OpenURL ContextObject to represent usage data

Referent * identifier * metadata

Requester* User or user proxy: IP, session, …

ServiceType

<?xml version=“1.0” encoding=“UTF-8”?><ctx:context-object timestamp=“2005-06-01T10:22:33Z” … identifier=“urn:UUID:58f202ac-22cf-11d1-b12d-002035b29062” …>…<ctx:referent> <ctx:identifier>info:pmid/12572533</ctx:identifier> <ctx:metadata-by-val> <ctx:format>info:ofi/fmt:xml:xsd:journal</ctx:format> <ctx:metadata> <jou:journal xmlns:jou=“info:ofi/fmt:xml:xsd:journal”> … <jou:atitle>Toward alternative metrics of journal impact… <jou:jtitle>Information Processing and manage…/jou:jtitle>…</ctx:referent>… <ctx:requester> <ctx:identifier>urn:ip:63.236.2.100</ctx:identifier> </ctx:requester>… <ctx:service-type> … <full-text>yes</full-text> … </ctx:service-type> … Resolver… Referrer… ….</ctx:context-object>

Resolver:* identifier of linking server

Event information: * event datetime * globally unique event ID

Page 32: Attribution and Credit: beyond print & citations Johan Bollen Indiana University

Indiana UniversityBerkeley, CA - 2011

School of Informatics and Computing

Implications for structural analysis of usage data

• Sequence preservation allows:o Reconstruction of user behavioro Usage graphs!

• Statistics do not allow this type of analysis BUT are useful for:

o validating resultso rankings