Management of Data Collections

Preview:

DESCRIPTION

 

Citation preview

Data Collections

Bernadette Duffy and Abraham de Jesus

LIBR 580

Louise Broadley

October 5, 2011

What are Data Collections?

• Data from surveys, opinion polls, climate data

• Numeric data in machine-readable form • To make use of the data files need

Codebooks and other supporting files

Data Lifecyclefrom DataOne https://www.dataone.org/content/education

Libraries and Data Collections

• Important in academic and special libraries

• Used by researchers and policy analysts

• Academic libraries starting to get involved in the preservation of research data from own institution

UBC Library Data Serviceshttp://data.library.ubc.ca/

Data suppliers - UBC

• Statistics Canada http://www.statcan.gc.ca/ Canadian Census, labour, health, income, trade

• The Roper Center for Public Opinion Research at the University of Connecticut http://www.ropercenter.uconn.edu/ Opinion polls

• Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan http://data.library.ubc.ca/gen/icpsr.html Social Sciences data

abacus

abacus - data set Part 1

abacus - data set Part 2

Data file

Challenge - Cost

Strategies to reduce cost for subscription data sets

• Collaborative purchase with several departments (UC Berkeley)

• University consortium (UBC, SFU, UVic, UNBC combined to form BC Research Libraries’ Data Services consortium – abacus http://abacus.library.ubc.ca/

Challenge - Selection

Decisions are based on• Collection policy• Knowledge of what is available• Understanding user need• Cost• Individual patron need• If the data would be useful to multiple

users

Challenge - Supporting Access

• Make visible in Library Catalogue. • Convert file formats for use in statistical

programs• Outreach / education in use of data

collection and statistical tools• Workshops on data literacy• Create a Data Lab• Become embedded in course requiring use

of data collections

Infrastructure

• Data sets can be highly variable in size.• This creates certain infrastructural

challenges for storage, institution’s system, and the institution itself.

Storage

• Scalability: “the ability of a system, network, or process, to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth.” (Wikipedia)

• Location: Does your institution expect to host the data produced by researchers at that institution?

Systems Support

• Network: Can the network handle downloading of large datasets?

• Hardware: Can the systems support computation over disparate data sets?

• Software: Do you have statistical programs (like SPSS or R) available for your users?

• Flexibility: Can your system handle the wide variety of data formats, sizes, and uses?

• Example of a good system: http://www.devinfo.info/genderinfo/

UN Gender Info

Institutional Support

• Workflows: Can your data collections be integrated into the larger collections management framework?

• Faculty Partnerships: Will faculty work with the library to create data management plans?

• Mandate: Does your institution consider data collections a priority?

Preservation

• Best practices for data preservation mean that preservation concerns enter in at the earliest point in the data management cycle: creation.

Criteria for Preservation

• Obligation• Value• Uniqueness• Verification• Other Cultural Reasons

Metadata

• Plagued by a lack of standards.• No international metadata standard for

data sets.• Needs to give enough context for the data

to be understandable. • No clear citation practice has emerged for

data sets. • Data Documentation Initiative (DDI)

Wrap-Up

• What is a data collection? A collection of the data resulting from research.

• They have unique challenges for selection, access, infrastructure, and preservation.

• Data Curation is an up and coming field in librarianship.

• Librarians are uniquely poised to be involved in the recent surge of interest in data.

Recommended