View
214
Download
0
Category
Preview:
Citation preview
Software as a Well-Formed Research Object
DLF 2017 ForumPittsburgh, PA
October 24, 2017
Yasmin AlNoamany, John Borghi, Alexandra Chassanoff, Katherine Thornton
Who we are
2
Yasmin AlNoamany, University of California, Berkeley
John Borghi, California Digital Library
Alex Chassanoff, MIT Libraries
Katherine Thornton, Yale University Library
Background
1st cohort of Software Curation Postdoctoral Fellows at CLIR
Spread across 2 coasts, 5 institutions
Wide range of areas being explored
3
Software Curation: Conceptual Challenges
What is software?
4
Software Curation: Conceptual Challenges
What is curation?
5
Software Curation: Social Challenges
Social
Software in Scholarly Communications
Software and Academic Incentives
6
Software Curation: Technical Challenges
● Identifying ○ execution environment○ dependencies and integrated
libraries○ data ○ metadata○ individual components
● Evolution ● Compatibility● Migration
7
Image source: https://www.slideshare.net/robertodicosmo3/scilabtec-2015-48643729
Software Curation: Current Work
Survey of Researcher Practices and Perceptions: UC Berkeley and California Digital Library
8
Software Curation: Current Work
Research Questions
1. How are researchers using software? 2. How do researchers share their software? 3. What do researchers value about their software?
Areas of interest
1. Software and reproducible research practices2. Metrics for software
Software Curation: Current Work
Background
1. Increasing agreement that software and research-related code are important scholarly products
2. Research into how research software is mentioned, cited3. Surveys into practices and perceptions around other
research products (e.g. Data)
Software Curation: Current Work
Survey Design
1. Goal was to capture as broad a view of researcher practices and perceptions as possible.
2. 56 questionsa. 53 Multiple Choice b. 3 Open Response
Software Curation: Current Work
Distribution
1. Approved by UC Berkeley IRB2. Distributed via Qualtrics
Inclusion Criteria
1. Participant had to consent, be over the age of 18, and say that they use software during the course of their research
2. Participant had to complete at least the demographic section.
215 researchers respondents
Software Practices in Scientific Research
Overview of Software Practices in Scientific Research
Use of Research Software
Open Source versus Commercial
Coding Languages and Purpose
Coding Languages and Purpose
55.7% of
researchers selected
all the five purposes
86.4% of all
languages
Code Sharing Practices
Most of the time, researcher share source code via emails
In what format do you typically share your code? How do you share your code?
25
Some reasons:● “Not elegant”● “Licensing issues”● “Time pressure, time
it takes to tidy up and document code”
● “require 'cleanup' and better commenting”
Reproducibility Practices
CS researchers tend to provide information about dependencies more than other disciplines
do you share related files (e.g. datasets) with your code?
do you provide information about dependencies?
Preservation Practices
76.2% of researchers uses Github for preserving their codes
Where do you save your code or software so that it is preserved over the long term?
How long do you typically save your code or software?
How do you use software or code in your research?
“Software is the main driver of my research and development program. I use it for everything from exploratory data analysis, to writing papers. Most of my research activities include the writing of code specifically aimed at the implementation of particular analytic methods.”
“I use code to document in a reproducible manner all steps of data analysis, from collecting data from where they are stored (databases, spreadsheets, plain text files, etc.) to preparing the final reports (i.e. a set of scripts can fully reproduce a report or manuscript given the raw data, with little human intervention).”
30
How do you define “sharing” and “preserving”?
“I think of sharing code as making it publicly accessible, but not necessarily advertising it. I think of preserving code as depositing it somewhere remotely, where I can't accidentally delete it. I realize that GitHub should not be the end goal of code preservation, but as of yet I have not taken steps to preserve my code anywhere more permanently than GitHub.”
“..."Sharing", to me, means that somebody else can discover and obtain the code, probably (but not necessarily) along with sufficient documentation to use it themselves. "Preserve" has stronger connotations. It implies a higher degree of documentation, both about the software itself, but also its history, requirements, dependencies, etc., and also feels more "official"- so my university's data repository feels more "preserve"-ish than my group's Github page.”
31
Conclusion
● Researchers consider software to be as important as data
● Most researchers do differentiate sharing from preservation, but they need tools and guidance on how to preserve their code
● Time and licenses are the main constraints of sharing software
Software Curation: Current Work
MIT Libraries
● Iterative approach ● Consider software
● as an artifact with characteristics● as a research process
→ Software as a scholarly object in a digital scholarship ecosystem
33
Software Curation: Current Work
MIT Libraries
● Software Curation Profiles
● Software Intake Form
34
Software Curation: Current Work
Strategic thinking for institutions
● Define communities of practice● Identify boundaries for software as a scholarly object● Identify preservation outcomes + curation activities
----------------------------------------------------------
● Don’t Let Perfect Be the Enemy of Good
35
Software Curation: Current Work at Yale
Legacy software in library collections
CD-ROMs and floppy disks at risk of deterioration
Library might not have relevant computing platform
Cataloged according to principles of traditional MARC-based description
36
Emulation as a Service
http://bw-fla.uni-freiburg.de/
Developed by Albert Ludwigs Universität Freiburg
37
EaaS and Wikidata
38
Wikidata for Digital Preservation
Describing software, file formats, and configured environments in Wikidata
Proposing necessary properties to extend data models
39
Thank you!
Yasmin yasminal@berkeley.edu
John john.borghi@ucop.edu
Alex achass@mit.edu
Katherine katherine.thornton@yale.edu
40
References
Introduction to Software Survey
Software Preservation Network
The Pathways of Research Software Preservation
Metadata Standards Survey: Initial Results, Analysis, and Next Steps
41
Recommended