44
Smarter Data for Smarter Libraries RACHEL FRICK, OCLC MEMBERSHIP & RESEARCH JEFF MIXTER, OCLC MEMBERSHIP & RESEARCH

Smarter Data for Smarter Libraries

  • Upload
    oclc

  • View
    1.383

  • Download
    0

Embed Size (px)

Citation preview

Smarter Data for Smarter

LibrariesRACHEL FRICK, OCLC MEMBERSHIP & RESEARCH

JEFF MIXTER, OCLC MEMBERSHIP & RESEARCH

Data wants to be freed

Collections as Data

• Recognizing collections data as

research asset

– About the collections

– Digital humanities

– Changing social norms

• Power in the aggregate

• Ben Schmidt, A Brief Visual History of MARChttp://sappingattention.blogspot.com/2017/05/a-brief-visual-history-of-marc.html

Collections as Data:

Library of Congress

• National Digital Initiatives

• Experiments

• Tutorials and Data Sets

• https://labs.loc.gov/

Collections as Data:

Always Already Computational• IMLS supported effort

• "foster a strategic approach to developing, describing, providing

access to, and encouraging reuse of collections that support

computationally-driven research and teaching"

• Team: T.Padilla (UNLV); L.Allen (UPenn); S.Varner (UNC-CH):

S.Potvin (Texas A&M); E. Russey Roke (Emory); H. Frost

(Stanford)

• Data Facets: https://collectionsasdata.github.io/facets/

Collections as Data: OCLC Researchd

Not Scotch But Rum: The

Scope and Diffusion of

the Scottish Presence in

the Published Record

by Brian Lavoie

What is the most popular

Irish Book?

by Lorcan Dempsey

Data & The Challenge of Discovery

• Leveraging the data we have

• Providing interfaces that support serendipity

Good Form and Spectacle

• DateRanger : tool to normalize data range data– https://goodformandspectacle.wordpress.com/2017/08/14/dateranger-a-new-tool-to-share/

• Moma Exhibit Spelunker

– http://spelunker.moma.org

Exploring MoMa

Exploring Moma

Measuring IMPACT

• Europeana Impact Playbook

https://pro.europeana.eu/what-we-do/impact

• Framework for Measuring Reuse of Digital Objects

– IMLS funded

– Digital Library Federation open working group

– https://reuse.diglib.org/

Data Science at OCLC

Evaluating Data

Analyzing Institutional

Repository DataProviding intelligence on how library

materials are being used

Importance of Analytics

• Analytics can measure and highlight

impact and importance

– Visitors

– Citations

– Downloads

– Users

Evaluating Institutional Repository Analytics

• OCLC Research partnered with

Montana State University, University

of New Mexico and ACRL in an IMLS

funded grant project to evaluate IR

analytics

– “Measuring Up: Assessing Accuracy

of Reported Use and Impact of Digital

Repositories”

– http://scholarworks.montana.edu/xmlui/handle/1/8924

Initial Findings

• Institutional Repository usage analytics are often way off

– Either over-counted or under-counted

• It is very difficult to determine accurate Institutional

Repository usage

Page Type Definition Examples

Citable Content

Downloads

Non-HTML scholarly

content that may be

formally cited in the

research process

● Publication (.pdf)

● Presentation (.ppt)

● Data Sets (.csv)

Item SummaryHTML pages to help user

decide to download the full

publication

● Title & Abstract

● Item Metadata

AncillaryHTML pages that provide

general information or

navigation

● Search Results

● Browse by Author

● Statistics

Current Analytics methods

• Two classes of analytics

– Page Tagging

– Log File analysis

• Page Tagging misses clicks that do not originate from the

hosted website (i.e. direct links to material)

– Google Scholar, Twitter, Email, Facebook, etc.

• Log File data is polluted by robot traffic

– IRUS-UK found that 85% of repository traffic is from robots

Testing a Different Method

• We determined that Google Search Console API can be

used to accurately identify human traffic to IR material

RAMP – Repository Analytics and Metrics Portal

• The initial findings from the project led to the development

of RAMP

– Cloud-based Web service

– No installation

– Minimal training and configuration

– Consistent method and terminology

– Benchmarking across time and organization

Daily Statistics

Daily Statistics

Citable

Content

Click

ThroughURL Country Device Position Date

Impress

ionsClicks

No 0 http://scholarworks.montana.edu/xmlui/handle/1/9348 hrv DESKTOP 31 3/8/17 1 0

Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/8705/White

nS0814.pdf;sequence=1pan MOBILE 6 3/8/17 1 1

Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/3670/3176

2001131281.pdf;sequence=1fra DESKTOP 24 3/8/17 1 0

Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/7215/3176

2101989810.pdf?sequence=1chn DESKTOP 13 3/8/17 2 0

Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/11518/15-

002_Surface-attached_cells_biofilms_A1b.pdf?sequence=1gbr DESKTOP 10 3/8/17 1 1

Yes 0http://scholarworks.montana.edu/xmlui/bitstream/1/1091/1/ColemanT

1212.pdfkwt MOBILE 3 3/8/17 1 1

No 0 http://scholarworks.montana.edu/xmlui/handle/1/9049 gbr DESKTOP 9 3/8/17 1 0

No 0 http://scholarworks.montana.edu/xmlui/handle/1/2567 egy DESKTOP 44 3/8/17 1 0

Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/7546/3176

2102468723.pdf;sequence=1twn DESKTOP 14 3/8/17 1 1

No 0 http://scholarworks.montana.edu/xmlui/handle/1/1854 tur DESKTOP 128 3/8/17 1 0

No 0 http://scholarworks.montana.edu/xmlui/handle/1/11498 usa DESKTOP 7 3/8/17 2 0

Daily Statistics

Cumulative Statistics

Cumulative Statistics

RAMP user activity

• 20 Institutional Repositories using

RAMP

• Current support for 5 IR Application

Stacks

• Tracking over 250,000 digital Items

• Capturing 19,000 CCD per day that

were previously invisible

Making Data

Interoperable

Sharing Image DataCommunity standards enable efficient, flexible, and

interoperable image data

Shared Standards & Values

IIIF International Image Interoperability Framework™

• The IIIF is an emerging standard for sharing

image data on the Web

• The IIIF standard normalizes technical and

structural data to help improve

interoperability across systems

http://iiif.io/

IIIF Application Programming Interfaces (APIs)

• The IIIF standard has 4 primary APIs:

– Image

– Presentation

– Search

– Authentication

IIIF Application Programming Interfaces (APIs)

• OCLC is actively supporting two IIIF APIs:

– Image

– Presentation

– Search

– Authentication

IIIF Image API

• A standard way to provide "technical" metadata about

images

• A IIIF Image API compliant image server is used to

transfer image files

• Compliant viewer application can understand and process

the Image API data

IIIF Presentation API

• Provides structural data about images

• Managing annotations

• There is no need for systems to understand various

metadata schemas

OCLC Research and IIIF

• OCLC Research started to experiment with IIIF in 2016

– Evaluating the standard

– Following development of the APIs

– Experimenting with producing IIIF data using CONTENTdm items

– Involvement in continued development of the APIs and IIIF

standards

Initial experiments

• Set up an Image server that supports IIIF

• Created sample Image API data for CONTENTdm items

• Tested the Image API data and Image Server using an

open source IIIF Image viewer

• This proof of concept led to the implementation of IIIF

Image API support in CONTENTdm

Continued experimentation

• The Presentation API requires more complex

understanding and processing of existing image

data/metadata

• Produced a proof of concept that CONTENTdm data could

be transformed into IIIF Presentation data and used in IIIF

compliant systems

• Developed code to bulk convert CONTENTdm data into

IIIF Presentation data

IIIF Support in CONTENTdm Today• CONTENTdm has implemented the Image and

Presentation APIs

– CONTENTdm serves as one of the largest IIIF Image servers in

the IIIF Community: ~20 Million Images

– The Presentation API is currently being implemented for all

CONTENTdm users: ~4 Million Presentation Manifests in

production

• This work has been a close collaboration between OCLC

GPM, GTECH and Research

OCLC Support of IIIF

• OCLC is a member of the IIIF Consortium and contributes

to the continued development and promotion of the IIIF

Community and the IIIF APIs

Thank you

Rachel FrickOCLC

[email protected]

Jeff MixterOCLC

mixterj@oclc