41
Software as a Well-Formed Research Object DLF 2017 Forum Pittsburgh, PA October 24, 2017 Yasmin AlNoamany, John Borghi, Alexandra Chassanoff, Katherine Thornton

Software as a Well-Formed Research Object

Embed Size (px)

Citation preview

Page 1: Software as a Well-Formed Research Object

Software as a Well-Formed Research Object

DLF 2017 ForumPittsburgh, PA

October 24, 2017

Yasmin AlNoamany, John Borghi, Alexandra Chassanoff, Katherine Thornton

Page 2: Software as a Well-Formed Research Object

Who we are

2

Yasmin AlNoamany, University of California, Berkeley

John Borghi, California Digital Library

Alex Chassanoff, MIT Libraries

Katherine Thornton, Yale University Library

Page 3: Software as a Well-Formed Research Object

Background

1st cohort of Software Curation Postdoctoral Fellows at CLIR

Spread across 2 coasts, 5 institutions

Wide range of areas being explored

3

Page 4: Software as a Well-Formed Research Object

Software Curation: Conceptual Challenges

What is software?

4

Page 5: Software as a Well-Formed Research Object

Software Curation: Conceptual Challenges

What is curation?

5

Page 6: Software as a Well-Formed Research Object

Software Curation: Social Challenges

Social

Software in Scholarly Communications

Software and Academic Incentives

6

Page 7: Software as a Well-Formed Research Object

Software Curation: Technical Challenges

● Identifying ○ execution environment○ dependencies and integrated

libraries○ data ○ metadata○ individual components

● Evolution ● Compatibility● Migration

7

Image source: https://www.slideshare.net/robertodicosmo3/scilabtec-2015-48643729

Page 8: Software as a Well-Formed Research Object

Software Curation: Current Work

Survey of Researcher Practices and Perceptions: UC Berkeley and California Digital Library

8

Page 9: Software as a Well-Formed Research Object

Software Curation: Current Work

Research Questions

1. How are researchers using software? 2. How do researchers share their software? 3. What do researchers value about their software?

Areas of interest

1. Software and reproducible research practices2. Metrics for software

Page 10: Software as a Well-Formed Research Object

Software Curation: Current Work

Background

1. Increasing agreement that software and research-related code are important scholarly products

2. Research into how research software is mentioned, cited3. Surveys into practices and perceptions around other

research products (e.g. Data)

Page 11: Software as a Well-Formed Research Object

Software Curation: Current Work

Survey Design

1. Goal was to capture as broad a view of researcher practices and perceptions as possible.

2. 56 questionsa. 53 Multiple Choice b. 3 Open Response

Page 12: Software as a Well-Formed Research Object
Page 13: Software as a Well-Formed Research Object
Page 14: Software as a Well-Formed Research Object

Software Curation: Current Work

Distribution

1. Approved by UC Berkeley IRB2. Distributed via Qualtrics

Inclusion Criteria

1. Participant had to consent, be over the age of 18, and say that they use software during the course of their research

2. Participant had to complete at least the demographic section.

Page 15: Software as a Well-Formed Research Object

215 researchers respondents

Page 16: Software as a Well-Formed Research Object

Software Practices in Scientific Research

Page 17: Software as a Well-Formed Research Object

Overview of Software Practices in Scientific Research

Page 18: Software as a Well-Formed Research Object
Page 19: Software as a Well-Formed Research Object

Use of Research Software

Page 20: Software as a Well-Formed Research Object

Open Source versus Commercial

Page 21: Software as a Well-Formed Research Object

Coding Languages and Purpose

Page 22: Software as a Well-Formed Research Object

Coding Languages and Purpose

55.7% of

researchers selected

all the five purposes

86.4% of all

languages

Page 23: Software as a Well-Formed Research Object

Code Sharing Practices

Page 24: Software as a Well-Formed Research Object

Most of the time, researcher share source code via emails

In what format do you typically share your code? How do you share your code?

Page 25: Software as a Well-Formed Research Object

25

Some reasons:● “Not elegant”● “Licensing issues”● “Time pressure, time

it takes to tidy up and document code”

● “require 'cleanup' and better commenting”

Page 26: Software as a Well-Formed Research Object

Reproducibility Practices

Page 27: Software as a Well-Formed Research Object

CS researchers tend to provide information about dependencies more than other disciplines

do you share related files (e.g. datasets) with your code?

do you provide information about dependencies?

Page 28: Software as a Well-Formed Research Object

Preservation Practices

Page 29: Software as a Well-Formed Research Object

76.2% of researchers uses Github for preserving their codes

Where do you save your code or software so that it is preserved over the long term?

How long do you typically save your code or software?

Page 30: Software as a Well-Formed Research Object

How do you use software or code in your research?

“Software is the main driver of my research and development program. I use it for everything from exploratory data analysis, to writing papers. Most of my research activities include the writing of code specifically aimed at the implementation of particular analytic methods.”

“I use code to document in a reproducible manner all steps of data analysis, from collecting data from where they are stored (databases, spreadsheets, plain text files, etc.) to preparing the final reports (i.e. a set of scripts can fully reproduce a report or manuscript given the raw data, with little human intervention).”

30

Page 31: Software as a Well-Formed Research Object

How do you define “sharing” and “preserving”?

“I think of sharing code as making it publicly accessible, but not necessarily advertising it. I think of preserving code as depositing it somewhere remotely, where I can't accidentally delete it. I realize that GitHub should not be the end goal of code preservation, but as of yet I have not taken steps to preserve my code anywhere more permanently than GitHub.”

“..."Sharing", to me, means that somebody else can discover and obtain the code, probably (but not necessarily) along with sufficient documentation to use it themselves. "Preserve" has stronger connotations. It implies a higher degree of documentation, both about the software itself, but also its history, requirements, dependencies, etc., and also feels more "official"- so my university's data repository feels more "preserve"-ish than my group's Github page.”

31

Page 32: Software as a Well-Formed Research Object

Conclusion

● Researchers consider software to be as important as data

● Most researchers do differentiate sharing from preservation, but they need tools and guidance on how to preserve their code

● Time and licenses are the main constraints of sharing software

Page 33: Software as a Well-Formed Research Object

Software Curation: Current Work

MIT Libraries

● Iterative approach ● Consider software

● as an artifact with characteristics● as a research process

→ Software as a scholarly object in a digital scholarship ecosystem

33

Page 34: Software as a Well-Formed Research Object

Software Curation: Current Work

MIT Libraries

● Software Curation Profiles

● Software Intake Form

34

Page 35: Software as a Well-Formed Research Object

Software Curation: Current Work

Strategic thinking for institutions

● Define communities of practice● Identify boundaries for software as a scholarly object● Identify preservation outcomes + curation activities

----------------------------------------------------------

● Don’t Let Perfect Be the Enemy of Good

35

Page 36: Software as a Well-Formed Research Object

Software Curation: Current Work at Yale

Legacy software in library collections

CD-ROMs and floppy disks at risk of deterioration

Library might not have relevant computing platform

Cataloged according to principles of traditional MARC-based description

36

Page 37: Software as a Well-Formed Research Object

Emulation as a Service

http://bw-fla.uni-freiburg.de/

Developed by Albert Ludwigs Universität Freiburg

37

Page 38: Software as a Well-Formed Research Object

EaaS and Wikidata

38

Page 39: Software as a Well-Formed Research Object

Wikidata for Digital Preservation

Describing software, file formats, and configured environments in Wikidata

Proposing necessary properties to extend data models

39

Page 41: Software as a Well-Formed Research Object

References

Introduction to Software Survey

Software Preservation Network

The Pathways of Research Software Preservation

Metadata Standards Survey: Initial Results, Analysis, and Next Steps

41