15
14 August 2013 Data 101: A Gentle Introduction Presented by Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, Rotman School of Management, University of Toronto

Data 101: A Gentle Introduction

  • View
    644

  • Download
    0

Embed Size (px)

DESCRIPTION

This webinar was prepared for and hosted by the SLA Social Science and Transportation divisions and the Upstate New York chapter. Presented on August 14, 2013.

Citation preview

Slide 1

14 August 2013

Data 101:A Gentle Introduction

Presented by

Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, Rotman School of Management, University of Toronto

1

2

Our Agenda

Defining data librarianshipBasic terminologyCommon data sourcesOur challenge: data management, preservation, discovery and accessWhat are big data?What are data visualizations?SourcesQ & A

3

Defining Data Librarianship

Data librarianship is a relatively new area of practice, emerging with the growth of digital media since the 1970s;Data librarians are professional library staff engaged in managing research data as a resource, and supporting researchers in these activities;We support our institutions and researchers in the areas of data management, metadata management, and teaching how to use data as a resource;Many of us work in the social sciences, but there is growth in the natural sciences and humanities as well.

4

Basic Terminology

Data plural! Think: Squirrels!!

Microdata raw data, individual records consisting of rows of numbers (Excel spreadsheet);Statistics summarized tables and cross-tabulations that have been formulated from the raw data;Aggregate data statistical summaries organized in a data file structure (Excel) that permits further analysis;PUMF Public Use Microdata File raw data that is available for public use; some data may be filtered and geographies repressed to ensure personal privacy;Variables a set of factors, traits or conditions that describes a unit of analysis; for instance, sex, age, marital status, etc.Frequencies the number of times an observation occurs in the data;

5

Common Data Sources

Govt- collected surveysUS Census (American Fact Finder)Bureau of Labor Statistics, Bureau of Economic Analysis, Statistics CanadaInternational sources such as UK Data Archive, Swedish National Data Service, Australian Data Archive, etc.OECD iLibrary World DataBankPew Research Center GallupThomson

6

Other International Data Sources

Some countries do not gather data, have not been gathering data for very long, or else limit or filter available dataFor instance, Russia, India, China and other developing countries may not gather, preserve or release their data;The BRICs (Brazil, Russia, India, China) will struggle with this issue as their economies grow.

7

Uncommon Data Sources

Data can come from everywhere;Occasionally, the MPI acquires data from unusual sources, such as:Rolling Stone magazineMySpace social media site for bandsCrunchBase database of technology companies

Data Management, Preservation, Discovery & Access

Weve conquered print collections, but data present a new challenge;Like all digital files, metadata is necessary to describe data assets;Like images, a single data set can mean many things to many people;How do we manage these data to make sure they are discoverable, accessible, and preserved?Traditionally, data files have been stored on network drives, and shared or restricted according to the groups who need to use them;Network drives are difficult to search, can be hard to share and restrict, and dont deal with metadata well;Web pages with links has been a common way to distribute data sets;We needed new tools a new kind of catalogue that is designed for the specialized needs of data.

9

Data Discovery Platforms

Nesstar developed in Norway by Norwegian Social Science Data Services, used by Statistics Canada, UK Data Archive, NORC at the University of ChicagoODESI proprietary system developed and used by Scholars PortalDataverse Open source system developed by the Institute for Quantitative Social Science (IQSS) at Harvard, used by NBER and ICPSR

Dataverse

We installed an iteration of Dataverse at the University of Toronto, in our cloud, and I manage my data collections myself;As an open source solution, its cost-effective and my colleagues at Scholars Portal support it for me and other Ontario universities.The data are associated with studies; several data sets can be associated with a single study;The world can see the metadata for each data collection, but access to the data sets themselves are restricted to those who contact me to get permission.

12

What are Big Data?

Big Data are data that are too large for the average database management tool (Access and Excel, for instance).Examples come from meteorology, genomics and physics. At MPI we wrestle with large GIS data sets (maps and satellite data), and deal with data at the terabyte (1 trillion bytes) level. Larger data sets deal with petabytes (1 quadrillion bytes) and exabytes (1 quintillion bytes).

13

Data Visualizations

The visual representation of data ---- literally, a picture can say a thousand [numbers]Edward Tufte is a key pioneer: http://www.edwardtufte.com/tufte/Fantastic examples at Flowing Data: http://flowingdata.com/ RSA Animate: http://www.thersa.org/

14

Sources

International Association for Social Science Information Services & Technology (ASSIST) - http://www.iassistdata.org/OECD iLibrary - http://www.oecd-ilibrary.org/World Bank Data - http://data.worldbank.org/UK Data Archive - http://data-archive.ac.uk/Nesstar - http://www.nesstar.com/Dataverse - http://thedata.org/

17 September 2012

Q & A

(and, Thank You!)

Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of [email protected]

15