Software as a Well-Formed Research Object

  • View
    214

  • Download
    0

  • Category

    Science

Preview:

Citation preview

Software as a Well-Formed Research Object

DLF 2017 ForumPittsburgh, PA

October 24, 2017

Yasmin AlNoamany, John Borghi, Alexandra Chassanoff, Katherine Thornton

Who we are

2

Yasmin AlNoamany, University of California, Berkeley

John Borghi, California Digital Library

Alex Chassanoff, MIT Libraries

Katherine Thornton, Yale University Library

Background

1st cohort of Software Curation Postdoctoral Fellows at CLIR

Spread across 2 coasts, 5 institutions

Wide range of areas being explored

3

Software Curation: Conceptual Challenges

What is software?

4

Software Curation: Conceptual Challenges

What is curation?

5

Software Curation: Social Challenges

Social

Software in Scholarly Communications

Software and Academic Incentives

6

Software Curation: Technical Challenges

● Identifying ○ execution environment○ dependencies and integrated

libraries○ data ○ metadata○ individual components

● Evolution ● Compatibility● Migration

7

Image source: https://www.slideshare.net/robertodicosmo3/scilabtec-2015-48643729

Software Curation: Current Work

Survey of Researcher Practices and Perceptions: UC Berkeley and California Digital Library

8

Software Curation: Current Work

Research Questions

1. How are researchers using software? 2. How do researchers share their software? 3. What do researchers value about their software?

Areas of interest

1. Software and reproducible research practices2. Metrics for software

Software Curation: Current Work

Background

1. Increasing agreement that software and research-related code are important scholarly products

2. Research into how research software is mentioned, cited3. Surveys into practices and perceptions around other

research products (e.g. Data)

Software Curation: Current Work

Survey Design

1. Goal was to capture as broad a view of researcher practices and perceptions as possible.

2. 56 questionsa. 53 Multiple Choice b. 3 Open Response

Software Curation: Current Work

Distribution

1. Approved by UC Berkeley IRB2. Distributed via Qualtrics

Inclusion Criteria

1. Participant had to consent, be over the age of 18, and say that they use software during the course of their research

2. Participant had to complete at least the demographic section.

215 researchers respondents

Software Practices in Scientific Research

Overview of Software Practices in Scientific Research

Use of Research Software

Open Source versus Commercial

Coding Languages and Purpose

Coding Languages and Purpose

55.7% of

researchers selected

all the five purposes

86.4% of all

languages

Code Sharing Practices

Most of the time, researcher share source code via emails

In what format do you typically share your code? How do you share your code?

25

Some reasons:● “Not elegant”● “Licensing issues”● “Time pressure, time

it takes to tidy up and document code”

● “require 'cleanup' and better commenting”

Reproducibility Practices

CS researchers tend to provide information about dependencies more than other disciplines

do you share related files (e.g. datasets) with your code?

do you provide information about dependencies?

Preservation Practices

76.2% of researchers uses Github for preserving their codes

Where do you save your code or software so that it is preserved over the long term?

How long do you typically save your code or software?

How do you use software or code in your research?

“Software is the main driver of my research and development program. I use it for everything from exploratory data analysis, to writing papers. Most of my research activities include the writing of code specifically aimed at the implementation of particular analytic methods.”

“I use code to document in a reproducible manner all steps of data analysis, from collecting data from where they are stored (databases, spreadsheets, plain text files, etc.) to preparing the final reports (i.e. a set of scripts can fully reproduce a report or manuscript given the raw data, with little human intervention).”

30

How do you define “sharing” and “preserving”?

“I think of sharing code as making it publicly accessible, but not necessarily advertising it. I think of preserving code as depositing it somewhere remotely, where I can't accidentally delete it. I realize that GitHub should not be the end goal of code preservation, but as of yet I have not taken steps to preserve my code anywhere more permanently than GitHub.”

“..."Sharing", to me, means that somebody else can discover and obtain the code, probably (but not necessarily) along with sufficient documentation to use it themselves. "Preserve" has stronger connotations. It implies a higher degree of documentation, both about the software itself, but also its history, requirements, dependencies, etc., and also feels more "official"- so my university's data repository feels more "preserve"-ish than my group's Github page.”

31

Conclusion

● Researchers consider software to be as important as data

● Most researchers do differentiate sharing from preservation, but they need tools and guidance on how to preserve their code

● Time and licenses are the main constraints of sharing software

Software Curation: Current Work

MIT Libraries

● Iterative approach ● Consider software

● as an artifact with characteristics● as a research process

→ Software as a scholarly object in a digital scholarship ecosystem

33

Software Curation: Current Work

MIT Libraries

● Software Curation Profiles

● Software Intake Form

34

Software Curation: Current Work

Strategic thinking for institutions

● Define communities of practice● Identify boundaries for software as a scholarly object● Identify preservation outcomes + curation activities

----------------------------------------------------------

● Don’t Let Perfect Be the Enemy of Good

35

Software Curation: Current Work at Yale

Legacy software in library collections

CD-ROMs and floppy disks at risk of deterioration

Library might not have relevant computing platform

Cataloged according to principles of traditional MARC-based description

36

Emulation as a Service

http://bw-fla.uni-freiburg.de/

Developed by Albert Ludwigs Universität Freiburg

37

EaaS and Wikidata

38

Wikidata for Digital Preservation

Describing software, file formats, and configured environments in Wikidata

Proposing necessary properties to extend data models

39

Thank you!

Yasmin yasminal@berkeley.edu

John john.borghi@ucop.edu

Alex achass@mit.edu

Katherine katherine.thornton@yale.edu

40

References

Introduction to Software Survey

Software Preservation Network

The Pathways of Research Software Preservation

Metadata Standards Survey: Initial Results, Analysis, and Next Steps

41

Recommended