ASIS&T DATA RESEARCH ACCESS AND PRESERVATION SUMMIT
-LEGAL AND SOCIAL IMPLICATIONS OF SHARED COLLECTIONS-
PHOENIX, AZAPRIL 10, 2010
MELISSA CRAGINCENTER FOR INFORMATICS RESEARCH IN SCIENCE AND SCHOLARSHIP
GSLIS – UIUC
Small science research and the data sharing strata
What are the social contexts under which research communities assemble to share and manage data?
Small Science
Investigating Data Communities
Data Curation Profiles
Implications for Data Management Systems
20-80Rule: The small are big!
Total Grants 9347
$2,137,636,716
20% 80%
Number Grants 1869 7478
Total Dollars $1,199,088,125 $938,548,595
Range $6,892,810-$350,000
$350,000-$831
Heidorn, P. B. (2008).
Small science in flux
Traditional features single PI (often) often dependent on graduate students ad hoc data management systems idiosyncratic sharing practices “success” dependent on using one’s own data
But… may be producing all digital data may be working at community level may be conducting “data-driven” science may be producing very large data sets Reference data sets needed even in small, specialized data
communities
See, for example, Borgman, Wallis, & Enyedi (2007); Cragin et al. (in press)
Data Communities and Collections
institutional repository across fields
scholar created primary source collections
disciplinary resource
local, cross-departmental
geographically based, cross-disciplinary resource
national cultural heritage collection aggregation
national data cyberinfrastructure paradigm
Data Communities
While the conceptualization of ‘data community’ is in a developmental phase now, we have
identified at least 3 quite vividly, including
1.) sub-disciplines focused on particular kinds of data that support specific
measurements or analysis;
2.) scientists focused on specific, focused research problem (this is often interdisciplinary
in nature);
3.) researchers working to develop and use a shared, community-level data collection (i.e.
“Resource Collection,” NSB, 2005)
Data Curation Profiles – Data Conservancy
The Data Curation Profile supports the ongoing development of system and policy Assists in the facilitation of workflows and services for
particular data types focal to use within the data community or sub-discipline
Provides crucial information; can serve as documentation for collection policy including selection, appraisal and retention guidelines
A template is currently in draft form and will be tested and revised in (2) stages of “use” tests: Production Application
Salient Features
Description of DataRequired Contextual
InformationApplicable Standards
Links to formal, area-specific metadata, ontologies, etc.
Intellectual Property & Access “Rules” Data owners Terms of use Attribution
Anticipated User Support
Re-appraisal Schedule Data provenance Version control Format migration
Workflow for Ingest & Maintenance of this “kind” of data
Data Curation Profiles Project
research data management / metadata workflow policies for archiving and access system requirements for managing data in a repository librarians roles and skill sets to support archiving and sharing
BiochemistryBiology
Civil EngineeringElectrical Engineering
Food SciencesEarth and Atmospheric Sciences
Soil Science
AnthropologyGeology
Plant SciencesKinesiology
Speech and Hearing Earth and Atmospheric
SciencesSoil Science
Purdue University Libraries, D. Scott Brandt, PI IMLS # LG-06-070032-07
Field
Specific Research
Area Form to be shared FormatsType of data set Size
Shared when?
Atmospheric science
severe weather modeling
compressed output of the model Vis5D
1 file / dataset 10-100 Mb
4-6 month embargo,
Agronomy
water quality, drainage, and plant growth
cleaned and reviewed sensor and hand-collected sample data .xls
approx. 100 files
~1MB each, up to 20 Mb
After publication
Geologyrock, water and microbes
averaged sensor and hand-collected sample data; photographs .xls; jpg
1 file; images < 1 Mb
After publication
Civil Engineering
traffic movement
cleaned and normalized sensor data
MySQL (postgresql)
1 database
approx. 1000 K/day
1 month to 1 year embargo
Examples of what, and when
Profiling data complexities & differences
Data Characteristics
Crystallography Geobiology
Type
1. “Raw data” Most information rich, long-term value for
re-use…4. “CIF file” – crystallography exchange Most commonly shared data type
1. “Reduced spreadsheet” – table withaverage values for multiple observations
Most often requested by others
Intellectual Property/Data Owners
Service modelprovide a service to chemists by solving crystal structures
Ownership of the data is ambiguous, and require negotiation before data “hand-off
Depends on source of fundinggovernmental and private grants, gov. institutions, industry
Ownership of and right to the data range from full to very limited, some long-term “embargoes”
Accessibility
Field-wide repositories Many journals require deposit of CIF files OAI-PMH tools becoming available for CIF files
Difficult and ad hoc Well-known researchers receive direct requests for data, often based on publications
Sharing Practices
Distinguishing private exchange from open sharing exchange: sharing amongst collaborators is a primary
concern, often with significant barriers (more) open access: limited by need for control and reward
system, but also
Sharing with wider “publics” is conditioned by both data management pressures and personal experience the “known person – cost” algorithm incidents of misuse
What is most easily or willingly shared is not always the data that has the most re-use value
Distinguishing Public from Private Sharing
“Supplying” data – functional privacy targeted transfer of data to and from current
collaborators or close colleagues distribution on request Evaluation
Data exposure – “publication” general dissemination activity makes data accessible to the wider public
Implications for concerns with misuse
Misuse incidents experienced by the scientists in this study –
influenced their views on the appeal of data sharingdecreased their willingness to shareincreased cynicism about data sharing initiativeshad a real impact on their behavior
- misappropriation - lack of enforcement of institutional agreements
THANK YOU