COUNTER Point: Making the Most of Imperfect Data

Preview:

Citation preview

COUNTER PointMaking the Most of Imperfect Data

cc: amphalon - https://www.flickr.com/photos/72427312@N00

Jeannie Gartenschlaeger-CastroLindsay Cronk

4/4/2016

IntroductionWho we are, What we do

Two Different eResource Perspectives• Jeannie = Systems-Side• Lindsay = Service-Side

Disclaimer: I was studied international relations and studio art.

Photo by David Bygott - Creative Commons Attribution-NonCommercial-ShareAlike License https://www.flickr.com/photos/86666094@N00 Created with Haiku Deck

Statistical modeling is the application of a set of assumptions to data, typically paired data.

Photo by Biblioteca General Antonio Machado - Creative Commons Attribution License https://www.flickr.com/photos/37667416@N04 Created with Haiku Deck

All COUNTER Reports are Time Series Data-Sets.• Continuous time interval• Successive measurements• Equal spacing/time between data points• Single measures within the report period

Decomposition of Time Series• Segments time series • Estimates based on predictability• Wold’s theorem/decomposition – every time

series can be decomposed into a pair of uncorrelated processes, one deterministic/one time/average based– Imagine usage in two components, one trend oriented

(COUNTER reporting periods) and one irregular (faculty recommendations/libguides/external drivers)

Exponential Smoothing• Smooths time series data• Eliminates frequency noise/outliers

About the Approach

• Plays to COUNTER’s strengths• Addresses reporting weaknesses• Relatively straight forward analysis• Opportunity to test predictive analysis • Powerful visualizations

Context and Culture

cc: Misenus1 - https://www.flickr.com/photos/44075517@N00

Statistical Modeling in the Librarycc: Boston Public Library - https://www.flickr.com/photos/24029425@N06

Choosing Resources for Pilot

• Needed 4 year+ usage history for reverse predictive analysis

• Larger numbers make analysis easier (went aggregate)

Getting StartedJR1/DB1 – 2010 to 2013

• 4 JR datasets (Elsevier, Wiley, Highwire, and Cambridge)

• 4 DB datasets (Ebsco and ProQuest, separate sets for sessions and searches)

Applications• Excel – Data collection/clean-up• R – Data analysis• Tableau – Data visualization

Excel

R

Learn About RResources/Tutorials I like: • R for Beginners: https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf• Quick R: http://www.statmethods.net/• Using R for Time Series Analysis: http://

a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html • R Time Series Quick Fix: http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm• Ryan Womack’s excellent video series: https://

www.youtube.com/watch?v=QHsmAM6nktY

Tableau

Findings and Next StepsTrends, Implications, and Plans

cc: DirectDish - https://www.flickr.com/photos/13800911@N08

Usage is consistent across vendor platforms.

Usage trends manifest across vendor platforms.

Usage can be predicted.

What is a good search to session ratio?

Moving Forward

• Going micro with big platforms• Heuristic examination of databases with low

search to session ratios• Developing trend reports for CMC/selectors

Thank you!

cc: USFWS Pacific - https://www.flickr.com/photos/52133016@N08

Questions

cc: Maëlick - https://www.flickr.com/photos/113604805@N04

Keep in touch

cc: tasslehoff84 - https://www.flickr.com/photos/23284841@N00

Jeannie Gartenschlaeger-Castrojmcastro10@uh.edu713-743-9346

Lindsay Cronklacronk@uh.edu713-743-0519@linds_bot

Recommended