Upload
dwayne-gray
View
217
Download
4
Tags:
Embed Size (px)
Citation preview
The Data Liberation InitiativeOrientation Session
Statistics Canada / Statistique Canada
University of Alberta
December 5, 2001
Chuck Humphrey
Products and Services
Establishing Perspectives– statistical information
statistics and data
– statistics & data sources national and international
– continuum of access DLI
Statistical Information
Statistics• numeric facts/figures• created from data,
i.e, already processed• presentation-ready
Data• numeric files
organized for analysis• requires processing• not ready for display
Statistical Information
The lines are blurring ...– the past
if it was on paper, it was statistics
if it was digital, it was data
– the present dynamic tables retrievable
from online databases e-journal publications with
tables
Statistical Information
Statistics ... and a map!
Statistical Information
Product Implications± won’t have a ‘published’ product but
rather forced to work with dynamically generated tables from databases
± toward this end, will see more Web retrieval of statistics and processing of data
• examples: STC Community Profiles and ICPSR Data Analysis System
Statistical Information
Product Implications± may only see graphical displays of
statistics or data without the numbers or data
• example: Web map servers
Statistical Information
Service Implications+ spend less time providing technical
services and more time doing extended reference and consulting
± the move to disintermediate products, that is, make them self-serve
Statistical Information
Service Implications- need to deal with an even wider
variety of retrieval or software tools and possibly formats
- may be more difficult to get at the actual statistics or data that are wanted (especially historical data)
Statistics & Data Sources
Financial & StockData
AcademicResearch
Data
Statistics Canada
OtherCanadianGov’t &
Non-gov’tSources
Statistics & Data Sources
Statistics CanadaOther Governmental
& Non-Governmental
Academic Research Data
Financial & Stock Data
Surveys– x-sect’l &
longitudinal
Aggregate dbases– time-series & x-class
Geography files Supporting
documentation– SIC, SOC
Statistics & Data Sources
Statistics Canada
Other Governmental & Non-Governmental
Academic Research Data
Financial & Stock Data
Health Canada– HBSC & Heart Health
CIC– LIDS & IMDB
CIHI GDSourcing Statistical
Universe
Statistics & Data Sources
Statistics Canada
Other Governmental & Non-Governmental
Academic Research Data
Financial & Stock Data
ICPSR– ISSP– World Values– Euro-barameters
ISR-York– CNES
Data Libraries– AAS
Statistics & Data Sources
Statistics Canada
Other Governmental & Non-Governmental
Academic Research Data
Financial & Stock Data
Datastream Financial Post
Corporate Database
Compustat CRSP DRI Basic
Economics
Statistics & Data Sources
Statistics Canada is an important source for statistics and data, but not the only source.
Continuum of Access
Turning to Statistics Canada, access to statistics and data is through a variety of services and initiatives.
Think of this as a continuum along which levels of access are provided.
Continuum of Access
Characteristics of this continuum are:
– cost : which runs from free to expensive
– restrictions : which runs from open to very restricted
– information : which runs from statistics to data
Statistical Information Available through Statistics Canada
Different Services
Service:Statistics
Canada WebsiteDepository
ServiceProgram
Data LiberationInitiative
Cu$tomizedTabulations &Pay per View
Remote JobSubmission
Research DataCentres
Who isEligible &Conditions:
General Public:available on theInternet at
www.statcan.ca
DesignatedDSP Libraries& their Users:available on site
Post-secondaryAcademic:restricted toteaching andresearch purposes
Individuals:contract betweenSTC andindividual
ApprovedResearchers:contract betweenSTC andindividual
ApprovedResearchers:SSHRC peerreview & deemedSTC employee
Products:- The Daily- Canadian
Statistics- Census- Statistical profiles
of Canadiancommunities
- Downloadablepublications
- Paper publica-tions- Electronic pub-lications, which
includes priceddown-loadable
publications &select CD ROMS
Standard dataproducts:aggregate databases, microdatafiles andgeography files
Tables fromconfidential filesthat are speciallyproduced byStatistics Canadafor a fee andaccess tospecializeddatabases
“Dummy” orsynthetic files tobuild analysissetups that mustthen be submitted
to Stats Can forprocessing
Confidential datafiles from thelongitudinalsurveys begun inthe 1990’s
NotesWarning: some
parts of the Websiteare fee-based
Some DSPlibraries provideoff-site access toauthenticatedusers
Interface toCANSIM I andTrade Analyzeravailable throughCHASS (Universityof Toronto) bysubscription
Specializeddatabases includeCANSIM II andTrade Analyzer
Services availablefor selected titles.Remote jobsubmission is themost developedfor NPHS.
Applications cannow be submittedthrough theSSHRC Web site.
ACCESSOpen
FreeStatistics
RestrictedExpensiveData
Products and Services
Summary– statistical information
traditional ways of handing print statistics now challenged by online statistics and data
– statistics & data sources Statistics Canada is an important source
but not the only source– continuum of access
Several points of access may be needed when dealing with Statistics Canada
Product Types
The DLI license provides post-secondary institutions with access to “standard data products”, which consist of
public use microdata, aggregate databases, and geography files
listed in the Statistics Canada Catalogue.
Product Types
Think of this as the stuff that is sold, excluding publications and services.
• Tape• CD-ROM• Diskette
STC Online Catalogue – Medium Categories
Product Types
Think of this as the stuff that is sold, excluding publications.
Tape
CD-ROM
Diskette
Product Types
Aggregate data– statistics organized in databases
or as data files– tabulations structured by time,
geography, and social content
Aggregate Data
Structure
– Time
– Geography
– Social
Content
Example: CANSIM
Aggregate Data
Structure
– Time
– Geography
– Social
Content
Example: CANSIM
Aggregate Data
Structure
– Time
– Geography
– Social
Content
Example: Census
Aggregate Data
Structure
– Time
– Geography
– Social
Content
Example: Small Area Statistics
Aggregate Data
Structure
– Time
– Geography
– Social
Content
Example: HID
Product Types
Microdata– raw data organized in a file where
the records or lines in the file are observations of a specific unit of analysis and the information on the lines are the values of variables
– requires some form of processing or analysis to be used
Public Use Microdata
Anonymized Microdata– these are microdata prepared to
minimize the possibility of disclosing or identifying any of the cases or observations
– the original data (or master file) are edited to create a public use microdata file
Public Use Microdata
Steps in Anonymizing Microdata removal of all personal identification
information (names, addresses, etc) include on gross levels of geography collapse detailed information into a
smaller number of general categories suppress the values of a variable
Public Use Microdata
Statistics Canada PUMFs– only available for select social
surveys that undergo a review of the Data Release Committee, an internal Statistics Canada committee
– no enterprise public use microdata
Public Use Microdata
Statistics Canada PUMFs– almost all are cross-sectional, that
is, represent data collected at one point in time
– longitudinal data are difficult to anonymize and maintain useful information
Public Use Microdata
Statistics Canada PUMFs– how do you recognize a PUMF?
Statistics Canada calls them public use microdata files in the Daily.
Statistics Canada Microdata
Other Microdata in Statistics Canada– Master files: these are the
confidential files from which public use microdata are created. They contain the fullness of the data captured about the unit of observation.
Statistics Canada Microdata
Other Microdata in Statistics Canada– Share files: these are confidential
files in which the respondents have signed a consent form permitting Statistics Canada to allow access for approved research to their information.
Product Types
Geography Files– Census digital boundary and
cartographic files in two proprietary formats: ArcView and MapInfo
– correspondence tables for linking between Postal Code geography and Census geography
Product Types
Digital Copies of Standardized Code Lists and Concordances
– Files containing standardized codes for industry, goods, and occupations
– correspondence tables between versions of standardized codes for industry and occupations
Data Service Models
Service models were presented as a continuum during the 1997 DLI workshop“Order & Pass-through” Service
Install Data and Provide Access
Treat as a Collection and Provide Reference
Data Service Models
Choose a model that matches your staff and computing resources
Acquisition Fill a Request Locate data Order data & documentationCollection Development
Select & Locate dataOrder data & documentation
Catalogue data & documentationInstall & Store (data & documentation)
ReferenceSearch for dataInterpret documentationRetrieve or download dataProcess data
change formatssubset cases or variablesaggregate casesmerge filesanalyze data
Acquisition Fill a Request Locate data Order data & documentationCollection Development
Select & Locate dataOrder data & documentation
Catalogue data & documentationInstall & Store (data & documentation)
ReferenceSearch for dataInterpret documentationRetrieve or download dataProcess data
change formatssubset cases or variablesaggregate casesmerge filesanalyze data
Acquisition Fill a Request Locate data Order data & documentationCollection Development
Select & Locate dataOrder data & documentation
Catalogue data & documentation Install & Store (data & documentation)
ReferenceSearch for dataInterpret documentationRetrieve or download dataProcess data
change formatssubset cases or variablesaggregate casesmerge filesanalyze data
Acquisition Fill a Request Locate data Order data & documentationCollection Development
Select & Locate dataOrder data & documentation
Catalogue data & documentation Install & Store (data & documentation)
ReferenceSearch for dataInterpret documentationRetrieve or download dataProcess data
change formatssubset cases or variablesaggregate casesmerge filesanalyze data
Acquisition Fill a Request Locate data Order data & documentationCollection Development
Select & Locate dataOrder data & documentation
Catalogue data & documentation Install & Store (data & documentation)
Reference Search for data Interpret documentation Retrieve or download data Process data
change formatssubset cases or variablesaggregate casesmerge filesanalyze data
Acquisition Fill a Request Locate data Order data & documentationCollection Development
Select & Locate dataOrder data & documentation
Catalogue data & documentation Install & Store (data & documentation)
Reference Search for data Interpret documentation Retrieve or download data Process data
change formatssubset cases or variablesaggregate casesmerge filesanalyze data
Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation)
Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables
aggregate casesmerge filesanalyze data
Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation)
Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables
aggregate casesmerge filesanalyze data
Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation)
Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables
aggregate casesmerge filesanalyze data
Acquisition Fill a Request Locate data Order data & documentation Collection Development Select & Locate data Order data & documentation Catalogue data & documentation Install & Store (data & documentation)
Reference Search for data Interpret documentation Retrieve or download data Process data change formats subset cases or variables
aggregate casesmerge filesanalyze data
Find a referral partner on campus
The Inventory Model
In the traditional inventory model, roughly half of the support goes to putting items on the shelf, while the other half goes to finding and getting the items off the shelf.
Source: Darlene Fichter
The Access Model
With the access model, support is split between getting information into a deliverable state and finding appropriate ways of retrieving and disseminating the information.
Access Models
The access models for data and statistics are not really that different from the models employed with bibliographic and full-text databases.
stand-alone workstation
local area network CD-server
campus network server
Internet server
Examples of Access Models
Let’s look at some technology-based examples of access models divided between:
– statistics and aggregate data, and
– microdata.
Stand-alone Workstation
Advantages– install once with usually fewer
problems– usually fewer license issues
Disadvantages– patron must come to the service– queues may develop to use the
workstation
Stand-alone Workstation
DLI Examples– Statistics and Aggregate Data
1996 Census CD-ROMs, Industrial Monitor, Inter-corporate Ownership, Canadian Business Patterns
– Microdata 1996 Census Public Use Microdata Files a download station for data services
staff to write files onto removable media
LAN CD Server
Advantages– access to a wider number of
concurrent users– products not as ghettoized
Disadvantages– patron may still have to come to the
service– LANs increase installation difficulties
LAN CD Server
DLI Examples– Statistics and Aggregate Data
1996 Census CD-ROMs, Industrial Monitor, Inter-corporate Ownership, Canadian Business Patterns (same examples)
– Microdata place on a shared disk drive copies of
microdata files for patrons to analyze or to write files onto removable media
Campus Network Server
Advantages– access to largest number of
concurrent users– patron does not have to come to the
service
Disadvantages– licensing issues tend to increase– helper apps must be widely installed
Campus Network Server
DLI Examples– Statistics and Aggregate Data
Beyond 20/20 files from the 1996 Census or Health Indicators (serve files not necessarily applications)
– Microdata place on an institutional file server copies
of microdata files for patrons to analyze or to write files onto removable media
use of data extraction tools
Internet Server
Advantages– possible to integrate local and
remote services through a common (seemingly seamless) point of access
– increases flexibility in the use of local hardware & storage
– creates sharing opportunities between institutions
Internet Server
Disadvantages– increases dependence on the agenda
of others to enhance and fix problems
– often must pay a subscription fee to use
– may increase licensing obligations
Internet Server
DLI Examples– Statistics and Aggregate Data
access to Internet database applications such as E-STAT and CHASS CANSIM II
– Microdata access to Internet data extraction tools
such as IDSL, LANDRU, ISLAND, QWIFS, Sherlock, TDR
A Mixed Access Model
Many of us employ a mix of the above access methods. This depends upon:
– our institution’s technology mix– our access to technology on our
campus– ways that we’ve handled
different formats
Access/Dissemination Issues
Regardless of the access method used, certain issues apply in all instances.
– managing licenses– determining dissemination
options
Managing Licenses
What are the conditions of use specified in the license?
What type of identification or authentication is required?
Managing Licenses
DLI License– must be an authorized user
need to identify type of user– has only conditional use of
material need to restrict to non-commercial
uses of material– permits sharing among DLI
member institutions
Managing Licenses
Product Licenses– may restrict the use of the product
e.g., Beyond 20/20: educational use only
– may restrict the number of copies that can be disseminated
– may prevent the distribution of a specific format for a product
e.g., Oracle & World Trade Analyzer
Managing Licenses
Special Vendor Licenses– may require a content license
separate from the access method e.g., CHASS’ CANSIM access is
based on the DLI license to provide access to the content in CANSIM and the CHASS license is required to use their Internet access tool
Dissemination Options
Determining how to disseminate DLI products
– what are finding tools for locating DLI products at your institution?
– what are the access formats needed for your institution?
Dissemination Options
Finding Tools– will the product be catalogued?– will the product be associated with a
specific service and/or workstation? e.g., located in Data Services or
Reference
– will the product be listed on the library web site?
Dissemination Options
Access formats– is there a format that is commonly
requested at your institution? e.g., do most patrons want
microdata in SPSS .sav files?
– is there a dissemination format that is required as part of your service?
e.g., a format for a data extractor
Products, Service, Access
This concludes the discussion on DLI products, data service models, and access models.More will be said about reference and technical services for data later today.