42
www.cineca.i t ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text Andrea Bollini – Susanna Mornati

Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Embed Size (px)

Citation preview

Page 1: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

www.cineca.it

~

Integrate external services in DSpace submission process

How to make self-deposit easy and improve metadata quality and presence of full-text

Andrea Bollini – Susanna Mornati

Page 2: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Topics

⁄ Some context:⁄ CINECA a brief overview⁄ DSpace as part of a CRIS solution

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Make the repository an active actor:⁄ Discovering missing content⁄ Improve Fulltext presence

⁄ Integration of external services:⁄ Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc.⁄ Publishers policy: Sherpa/Romeo

Page 3: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

⁄ Owned companies: Kion, SCS.⁄ Employees: 400 (+150 Kion) ⁄ Total turnover: 70M€

The Company

⁄ Interuniversity Consortium⁄ No-Profit⁄ Founded in 1969⁄ Headquarter in Bologna

⁄ 57 Members⁄ 54 Universities⁄ 2 Research institutes⁄ MIUR

as last week!

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 4: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

⁄ The “merging process” of the three Italian Consortia started in September 2012

⁄ It was concluded in July 1st 2013 (last week!)

The Merge

2.0⁄ 67 Members

⁄ More than 700 employees (+ 150 Kion)

⁄ The only Italian Interuniversity Consortium

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 5: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Higher Education• Solutions & Services for the University Administration• Services for the Ministry of Education, University and

Research (MIUR)

Scientific Research• High Performance Computing – FERMI: 2° in EU / 7° WW)• Scientific Visualization & Interactive Virtual Environments

Technological Innovation• Data Center• Information and Knowledge Management Services• Health Care Systems

What CINECA does

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 6: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

• Cineca Board of Directors

Product Managers

Board

Product Managers

Board

U-GOV & SURplus

Restricted Board

U-GOV & SURplus

Restricted Board

Customer ServiceBoard

Customer ServiceBoard

Technical & Delivery Board

Technical & Delivery Board

AppsRoad Map

TechRoad Map

• University Customers• Focus Groups

• University Customers• Cineca Technical Board

Requirements Re

quire

men

ts

How we work with Universities

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 7: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Solutions for HE

= ERP

Authentication

= Best of Breed

AU

GW Gateway

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 8: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

SURplus: CINECA’ CRIS System

⁄ An interoperable infrastructure made of different components

⁄ Ingestion of data from any legacy systems adopted by an institution

⁄ Maintenance of specific functional requirements, data model and preferred technologies at the level of applications

⁄ Data warehouse and Business Intelligence tools to facilitate aggregations of data and the application of measurement parameters and algorithms

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 9: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

SURplus: Dimension

⁄ Beginning of activities: 2004

⁄ 9 institutions

⁄ 22 institutional repositories

⁄ Total modules: 77

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 10: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Topics

⁄ Integration of external services:⁄ Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc.⁄ Publishers policy: Sherpa/Romeo

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Make the repository an active actor:⁄ Discovering missing content⁄ Improve Fulltext presence

⁄ Some context:⁄ CINECA a brief overview⁄ DSpace as part of a CRIS solution

Page 11: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

CINECA is a registered service provider at DuraSpace

Long-term collaboration with DSpace community, since 2003

Upgrades are periodically released to the open source community

DSpace: SURplus’ Open Archive Module

⁄ Manages collection and dissemination of research results

⁄ Simplifies data collection’s processes

⁄ Service Integration

The OA Module, developed on DSpace:

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 12: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

“dissemination of

entities’ descriptions in

the research

environment which go

beyond publications”

DSpace-CRIS: SURplus’ Expertise & Skills

DSpace-CRIS: designed together with the Hong Kong University & released as open-source

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 13: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

IR as part of a CRIS system: what change?

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Benefits:⁄ Strong deposit mandate⁄ More funding

⁄ Issues to mitigate:⁄ IR become a critical application⁄ Author have a “requirements” perception

Wasting time Late submission

Professional supportHA infrastructureDedicated team

advocacy

Make the submission process easy

The information already exists in other database!

Page 14: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Topics

⁄ Integration of external services:⁄ Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc.⁄ Publishers policy: Sherpa/Romeo

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Make the repository an active actor:⁄ Discovering missing content⁄ Improve Fulltext presence

⁄ Some context:⁄ CINECA a brief overview⁄ DSpace as part of a CRIS solution

Page 15: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

New first submission step

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Available providers: each provider is a spring service

Free search form

Main metadata common to all publication types (article, book, etc.)

Title of the contributionYear

Authors/Editors

Page 16: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

New first submission step

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Lookup by unique identifier

Each provider declares which identifiers is able to manage

Page 17: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

New first submission step

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

For each result providers are shown that match the record.

Grouping is done via DOI

Page 18: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Modal box publication details

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Records from different providers are merged to get richer metadata

The system guesses a collection for the submission but the user

can change it if required

Page 19: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Manual submission

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

When lookup fails the user can always proceed manually

Page 20: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Batch import from external source

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Import data (identifiers or structured text) can be inputed manually or uploaded as a file

Format/provider must be specified by the user

Page 21: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Batch import from external source

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Request are processed:⁄ Inline for specific providers and/or within configured data

limits Submitter can immediately complete the pre-filled submissions

⁄ In a background process Submitter will receive a summary email with import

result Pre-filled submissions are available as in-progress

submission in the MyDSpace

The legacy batch import feature for JSPUI has been already shared as pull request on GitHub, see DS-1252

Page 22: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Enhanced Describe step: showing metadata source

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 23: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Tran

slati

on lo

gic

orig

inal

n

orm

alize

d

Technical details

PubMed Lookup Provider

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

PubMedrecord

JAVA Bean

Mapping file

DSpaceItem

Normalizedrecord

Enhancer plugins

Split, aggregate fieldsDerive data

ISSN Journal title…

arXiv Lookup Provider

arXivrecord

JAVA Bean

Mapping file

Scopus Lookup Provider

Scopusrecord

JAVA Bean

Mapping file

Tran

slati

on lo

gic

Nor

mal

ized

R

epos

itory

Mapping file

<bean name="pubmedService" class=“...service.PubmedService"/>

<bean name="pubmedLookupProvider" class=“...lookup.PubmedLookupProvider">

<property name="pubmedService" ref="pubmedService"/>

</bean>

implements SubmissionLookupProvider

public class PubmedLookupProvider extends ConfigurableLookupProvider

public abstract class ConfigurableLookupProvider

public class PubmedItem{ private String pubmedID; private String doi; private String issn; private String eissn; private String journalTitle; private String title; private String pubblicationModel; private String year; private String volume; private String issue; private String language; private List<String> type; private List<String> primaryKeywords; private List<String> secondaryKeywords; …

Page 24: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Topics

⁄ Integration of external services:⁄ Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc.⁄ Publishers policy: Sherpa/Romeo

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Make the repository an active actor:⁄ Discovering missing content⁄ Improve Fulltext presence

⁄ Some context:⁄ CINECA a brief overview⁄ DSpace as part of a CRIS solution

Page 25: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Enhanced upload step

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Using the ISSN or EISSN provided in the describe step

the upload form is improved showing on the right side the publisher policy from the Sherpa/Romeo database

Page 26: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Enhanced upload step

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Access policy for the bitstream:Open access, embargo, intranet,

etc.

Deposit of fulltext to the national database for individual CVs

Page 27: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Topics

⁄ Integration of external services:⁄ Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc.⁄ Publishers policy: Sherpa/Romeo

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Make the repository an active actor:⁄ Discovering missing content⁄ Improve Fulltext presence

⁄ Some context:⁄ CINECA a brief overview⁄ DSpace as part of a CRIS solution

Page 28: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

What is the problem?

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

(very) late submissions produce some issues for the repository both at technical and organization level:/ The system is subjected to periods of intense input activities.

DSpace, but in general IR software, scales well for read operations less well for write operations

/ IR staff involved in workflow get lot of task to perform in small period

Get researcher aware

Remind researcher about IR presence

Intercept early new content

Page 29: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

How we plan to mitigate the problem?

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Citation databases provide APIs to perform search (we already use them for the lookup) and in some cases they provide additional APIs or search filters/indexes to make more raffinated search and allow scanning of the database. The interesting filters/indexes are:/ Time based (much better if related to insertion in the

citation database)/ Author ID (better if related to a «standard/common»

identifier as ORCID)/ Affiliation/ Subject category

Page 30: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Implementation idea

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Allow the researcher to store personal preferences about scanning:/ Enabled providers (e.g disable arXiv if you are not a

physicist)/ Frequencies/ Subject categories filters

AuthorIDs will be stored/retrieved from the Researcher profile.Subject categories could be proposed from previous items or researcher profile.

Page 31: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

DSpace-CRIS: Researcher profile

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 32: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Who are the potential targets?

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ ORCID⁄ Scopus⁄ Web of Science⁄ arXiv⁄ PubMed Central⁄ DBLP⁄ REPEC

The Repository itself!

Page 33: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

The repository as source of missing content?

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ The submitter has to match authors of publication with the University staff to higthlight internal authors ⁄ Sometimes matches are missing⁄ Othertimes matches are wrong (homonymous)

⁄ External authors could become «internal» at some point in the future

Page 34: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

The repository as source of missing content?

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Send email to internal «co-authors» when a submission is done prevent wrong attribution (and reduce duplication)

⁄ Allow researcher to unclaim publications from her profile last chance to fix wrong attribution

⁄ Allow researcher to claim publications fix missing attribution and/or engagement of new researcher

The last two features are included in the DSpace-CRIS addon

Page 35: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Current implementation: claim/unclaim publications in the repository

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

This is the current status of the publication

U Unlinked

You can claim itA Active, simple claimS Make it a selected publicationH Claim it but hide from you public profile

Page 36: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Current implementation: claim/unclaim publications in the repository

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

You can unclaim a publicationU Unlink

Page 37: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Current implementation: claim/unclaim publications in the repository

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

Page 38: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Topics

⁄ Integration of external services:⁄ Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc.⁄ Publishers policy: Sherpa/Romeo

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Make the repository an active actor:⁄ Discovering missing content⁄ Improve Fulltext presence

⁄ Some context:⁄ CINECA a brief overview⁄ DSpace as part of a CRIS solution

Page 39: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

Improve fulltext presence

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

⁄ Use the Sherpa/Romeo policy database to analyze repository content

⁄ Use external database API to find an actual fulltext (arXiv, pubmed, ...why not the publisher version via library subscription?)

⁄ Send email to researcher to validate found PDFs or ask for an «author» versions

⁄ Use statistics to encourage upload

Page 40: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

127.000+ items

65.000+ items

9,4% 17,2%

Sherpa/Romeo Statistics (Example)

www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013

51%ISSN

36%Not in Sherpa24.000 items

7,3% have a fulltext…

5,3% open access

32% green21.000 items

Page 41: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

www.cineca.it | Innovative Open Source Technologies for a CRIS: SURplus | euroCRIS | May 2013

SURplus: prevision 2014

⁄ 50+ institutional repositories (DSpace)

⁄ 10 research portals (DSpace-CRIS)

Page 42: Www.cineca.it ~ Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text

www.cineca.it

~

Thank you!Andrea Bollini

[email protected]

SURplus - http://www.cineca.it/en/content/surplus

DSpace-CRIS - http://cilea.github.com/dspace-cris