Upload
bennett-warren
View
215
Download
0
Embed Size (px)
Citation preview
Data for secondary analysis:the experience of the UK Data
Archive
Hilary BeedhamUK Data Archive
2
Overview
Introduction to the UK Data Archive What data do we disseminate? What are the constraints and how do we manage
them? Technological solutions
3
Introduction to the UK Data Archive (1)
30 years of making data available for secondary analysis
Service primarily for Education Resource discovery Data delivery can be:
Data browsing (on screen tables via Nesstar)Via the WWW (download service)On computer media such as CD
4
Introduction to the UK Data Archive (2)
Preservation for future use & support for depositors
On-line catalogue – HASSET thesaurus Links to similar organisations world-wide Integrated data catalogue searches across other
archives - ELSST thesaurus Training material and data workshops for users
5
What data do we disseminate?
Anonymised data PersonOrganisation
From a variety of sources:GovernmentResearch councilsCharitable organisations & foundationsPrivate organisations
6
What are the constraints?
Intellectual Property - copyright Data protection and data
7
Intellectual Property
Data producers, either organisational or individual, have IP in their data: all methods of dissemination must acknowledge ownership
Data Archives (and others disseminators who are not owners of data) operate on the basis of licences which require ownership to be acknowledged by secondary users
8
Data Protection and Respondent Confidentiality
Clearly this is a critical area of concern for data producers and has implications for users of data
It is often cited as a reason for restricting access to microdata
However, under licence, the UK Data Archive has been making microdata available for over 30 years, latterly using web technologies
9
Managing intellectual property rights
By maintaining structured catalogue records about the ownership of each dataset, our search and data browsing software creates on-screen copyright statements in Nesstar
For downloaded data, the appropriate citation is provided with the data file at the time of downloading
This also serves to reassure users that data are from a reliable source
10
Protecting respondent confidentiality (1)
We disseminate only anonymised dataOur data suppliers are responsible for ensuring that we
only receive anonymised data Nevertheless, our procedures for processing data
for preservation and dissemination include checks for disclosive information Variable contentDataset and documentation combinations
11
Protecting respondent confidentiality (2)
By legal means: through licences with data providers and legally binding undertakings with data users
Users are required to agree:Not to attempt to identify individualsNot to attempt to gain information about individuals by
combining with data from other sources By careful & consistent recording of relevant information
– about users, datasets and uses
12
Technological solutions
In the past, we relied on pen and paper to manage the undertakings with users.
It was a time consuming process and meant users often waited weeks to gain access to data.
As a result of technological developments, users can register, print an access agreement from the web, sign and fax it to us and, subject to status, have access to datasets within hours of their request
13
Access control
For our web based dissemination service we apply a sophisticated Access Control System: It can block access to unregistered users It can permit a user to browse the catalogue records but not the
data Or it can permit a user to browse the catalogue and the data (create
ad hoc tables on screen) but not download the data Or it will permit a user to download the data for use on their own
machine In theory it will also permit differential access to individual
variables within a file – we haven’t applied this but may in future
14
On-line statistical disclosure control
Real-time SDC systems are in existence in specialist fields such as medical statistics.
UKDA was partner to a project which explored the development of such a system for the Nesstar software.
Further work is needed to develop this – feasibility was demonstrated but a number of problems need to be overcome before a service could be implemented
15
Contact details
Hilary Beedham
UK Data Archive
University of Essex
http://www.nesstar.com/