52
Bram Zandbelt Best Practices for Scientific Computing: Principles & MATLAB Examples

Journal Club - Best Practices for Scientific Computing

Embed Size (px)

Citation preview

Page 1: Journal Club - Best Practices for Scientific Computing

Bram Zandbelt

Best Practices for Scientific Computing: Principles & MATLAB Examples

Page 2: Journal Club - Best Practices for Scientific Computing

Community Page

Best Practices for Scientific ComputingGreg Wilson1*, D. A. Aruliah2, C. Titus Brown3, Neil P. Chue Hong4, Matt Davis5, Richard T. Guy6¤,

Steven H. D. Haddock7, Kathryn D. Huff8, Ian M. Mitchell9, Mark D. Plumbley10, Ben Waugh11,

Ethan P. White12, Paul Wilson13

1 Mozilla Foundation, Toronto, Ontario, Canada, 2 University of Ontario Institute of Technology, Oshawa, Ontario, Canada, 3 Michigan State University, East Lansing,

Michigan, United States of America, 4 Software Sustainability Institute, Edinburgh, United Kingdom, 5 Space Telescope Science Institute, Baltimore, Maryland, United

States of America, 6 University of Toronto, Toronto, Ontario, Canada, 7 Monterey Bay Aquarium Research Institute, Moss Landing, California, United States of America,

8 University of California Berkeley, Berkeley, California, United States of America, 9 University of British Columbia, Vancouver, British Columbia, Canada, 10 Queen Mary

University of London, London, United Kingdom, 11 University College London, London, United Kingdom, 12 Utah State University, Logan, Utah, United States of America,

13 University of Wisconsin, Madison, Wisconsin, United States of America

Introduction

Scientists spend an increasing amount of time building andusing software. However, most scientists are never taught how todo this efficiently. As a result, many are unaware of tools andpractices that would allow them to write more reliable andmaintainable code with less effort. We describe a set of bestpractices for scientific software development that have solidfoundations in research and experience, and that improvescientists’ productivity and the reliability of their software.

Software is as important to modern scientific research astelescopes and test tubes. From groups that work exclusively oncomputational problems, to traditional laboratory and fieldscientists, more and more of the daily operation of science revolvesaround developing new algorithms, managing and analyzing thelarge amounts of data that are generated in single researchprojects, combining disparate datasets to assess synthetic problems,and other computational tasks.

Scientists typically develop their own software for these purposesbecause doing so requires substantial domain-specific knowledge.As a result, recent studies have found that scientists typically spend30% or more of their time developing software [1,2]. However,90% or more of them are primarily self-taught [1,2], and thereforelack exposure to basic software development practices such aswriting maintainable code, using version control and issuetrackers, code reviews, unit testing, and task automation.

We believe that software is just another kind of experimentalapparatus [3] and should be built, checked, and used as carefullyas any physical apparatus. However, while most scientists arecareful to validate their laboratory and field equipment, most donot know how reliable their software is [4,5]. This can lead toserious errors impacting the central conclusions of publishedresearch [6]: recent high-profile retractions, technical comments,and corrections because of errors in computational methodsinclude papers in Science [7,8], PNAS [9], the Journal of MolecularBiology [10], Ecology Letters [11,12], the Journal of Mammalogy [13],Journal of the American College of Cardiology [14], Hypertension [15], andThe American Economic Review [16].

In addition, because software is often used for more than a singleproject, and is often reused by other scientists, computing errors canhave disproportionate impacts on the scientific process. This type ofcascading impact caused several prominent retractions when an

error from another group’s code was not discovered until afterpublication [6]. As with bench experiments, not everything must bedone to the most exacting standards; however, scientists need to beaware of best practices both to improve their own approaches andfor reviewing computational work by others.

This paper describes a set of practices that are easy to adopt andhave proven effective in many research settings. Our recommenda-tions are based on several decades of collective experience bothbuilding scientific software and teaching computing to scientists[17,18], reports from many other groups [19–25], guidelines forcommercial and open source software development [26,27], and onempirical studies of scientific computing [28–31] and softwaredevelopment in general (summarized in [32]). None of these practiceswill guarantee efficient, error-free software development, but used inconcert they will reduce the number of errors in scientific software,make it easier to reuse, and save the authors of the software time andeffort that can used for focusing on the underlying scientific questions.

Our practices are summarized in Box 1; labels in the main textsuch as ‘‘(1a)’’ refer to items in that summary. For reasons of space,we do not discuss the equally important (but independent) issues ofreproducible research, publication and citation of code and data,and open science. We do believe, however, that all of these will bemuch easier to implement if scientists have the skills we describe.

The Community Page is a forum for organizations and societies to highlight theirefforts to enhance the dissemination and value of scientific knowledge.

Citation: Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M, etal. (2014) Best Practices for Scientific Computing. PLoS Biol 12(1): e1001745.doi:10.1371/journal.pbio.1001745

Academic Editor: Jonathan A. Eisen, University of California Davis, United Statesof America

Published January 7, 2014

Copyright: ! 2014 Wilson et al. This is an open-access article distributed underthe terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided theoriginal author and source are credited.

Funding: Neil Chue Hong was supported by the UK Engineering and PhysicalSciences Research Council (EPSRC) Grant EP/H043160/1 for the UK SoftwareSustainability Institute. Ian M. Mitchell was supported by NSERC Discovery Grant#298211. Mark Plumbley was supported by EPSRC through a LeadershipFellowship (EP/G007144/1) and a grant (EP/H043101/1) for SoundSoftware.ac.uk.Ethan White was supported by a CAREER grant from the US National ScienceFoundation (DEB 0953694). Greg Wilson was supported by a grant from the SloanFoundation. The funders had no role in study design, data collection and analysis,decision to publish, or preparation of the manuscript.

Competing Interests: The lead author (GVW) is involved in a pilot study of codereview in scientific computing with PLOS Computational Biology.

* E-mail: [email protected]

¤ Current address: Microsoft, Inc., Seattle, Washington, United States ofAmerica

PLOS Biology | www.plosbiology.org 1 January 2014 | Volume 12 | Issue 1 | e1001745

Best Practices for Scientific Computing

Wilson et al. (2014) Best Practices for Scientific Computing. PLoS Biol, 12(1), e1001745.

Page 3: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why is this important?

We rely heavily on research software No software, no research

http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

2014 survey among 417 UK researchers

Page 4: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

https://innoscholcomm.silk.co/

Tools that researchers use in practice

Page 5: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why is this important?

We rely heavily on research software No software, no research

Many of us write software ourselves From single- command line scripts to multigigabyte code repositories

Page 6: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

adapted from Leek & Peng (2015) Nature, 520 (7549), 612.

P values are just the tip of the iceberg

Ridding science of shoddy statistics will require scrutiny of every step, not merely the last one, say Jeffrey T. Leek and Roger D. Peng.

There is no statistic more maligned than the P value. Hundreds of papers and blogposts have been written

about what some statisticians deride as ‘null hypothesis significance testing’ (NHST; see, for example, go.nature.com/pfvgqe). NHST deems whether the results of a data analysis are important on the basis of whether a summary statistic (such as a P value) has crossed a threshold. Given the discourse, it is no surprise that some hailed as a victory the banning of NHST methods (and all of statistical inference) in the journal Basic and Applied Social Psychology in February1.

Such a ban will in fact have scant effect on the quality of published science. There are many stages to the design and analysis of a successful study (see ‘Data pipeline’). The last of these steps is the calculation of an inferential statistic such as a P value, and the application of a ‘decision rule’ to it (for example, P < 0.05). In practice, decisions that are made earlier in data analysis have a much greater impact on results — from experimental design to batch effects, lack of adjustment for confounding factors, or simple measurement error. Arbitrary levels of statistical significance can be achieved by changing the ways in which data are cleaned, summarized or modelled2.

P values are an easy target: being widely used, they are widely abused. But, in prac-tice, deregulating statistical significance opens the door to even more ways to game statistics — intentionally or unintentionally — to get a result. Replacing P values with Bayes factors or another statistic is ultimately about choosing a different trade-off of true positives and false positives. Arguing about the P value is like focusing on a single mis-spelling, rather than on the faulty logic of a sentence.

Better education is a start. Just as anyone who does DNA sequencing or remote-sensing has to be trained to use a machine, so too anyone who analyses data must be trained in the relevant software and con-cepts. Even investigators who supervise data analysis should be required by their funding agencies and institutions to complete train-ing in understanding the outputs and poten-tial problems with an analysis.

There are online courses specifically

designed to address this crisis. For example, the Data Science Specialization, offered by Johns Hopkins University in Baltimore, Maryland, and Data Carpen-try, can easily be integrated into training and research. It is increasingly possible to learn to use the computing tools relevant to specific disciplines — training in Bio-conductor, Galaxy and Python is included in Johns Hopkins’ Genomic Data Science Specialization, for instance.

But education is not enough. Data

analysis is taught through an apprenticeship model, and different disciplines develop their own analysis subcultures. Decisions are based on cultural conventions in spe-cific communities rather than on empirical evidence. For example, economists call data measured over time ‘panel data’, to which they frequently apply mixed-effects models. Biomedical scientists refer to the same type of data structure as ‘longitudinal data’, and often go at it with generalized estimating equations.

Statistical research largely focuses on mathematical statistics, to the exclusion of the behaviour and processes involved in data analysis. To solve this deeper problem, we must study how people perform data analysis in the real world. What sets them up for success, and what for failure? Controlled experiments have been done in visualiza-tion3 and risk interpretation4 to evaluate how humans perceive and interact with data and statistics. More recently, we and others have been studying the entire analysis pipe-line. We found, for example, that recently trained data analysts do not know how to infer P values from plots of data5, but they can learn to do so with practice.

The ultimate goal is evidence-based data analysis6. This is analogous to evidence-based medicine, in which physicians are encouraged to use only treatments for which efficacy has been proved in controlled trials. Statisticians and the people they teach and collaborate with need to stop arguing about P values, and prevent the rest of the iceberg from sinking science. ■

Jeffrey T. Leek and Roger D. Peng are associate professors of biostatistics at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland, USA.e-mail: [email protected]

1. Trafimow, D. & Marks, M. Basic Appl. Soc. Psych. 37, 1–2 (2015).

2. Simmons, J. P., Nelson, L. D. & Simonsohn, U. Psychol. Sci. 22, 1359–1366 (2011).

3. Cleveland, W. S. & McGill, R. Science 229, 828–833 (1985).

4. Kahneman, D. & Tversky, A. Econometrica 47, 263–291 (1979).

5. Fisher, A., Anderson, G. B., Peng, R. & Leek, J. PeerJ 2, e589 (2014).

6. Leek, J. T. & Peng, R. D. Proc. Natl Acad. Sci. USA 112, 1645–1646 (2015).

DATA PIPELINEThe design and analysis of a successful study has many stages, all of which need policing.

Experimental design

Data collection

Raw data

Data cleaning

Tidy data

Exploratory data analysis

Potential statistical models

Statistical modelling

Summary statistics

Inference

P value

Extreme scrutiny

Little debate

6 1 2 | N A T U R E | V O L 5 2 0 | 3 0 A P R I L 2 0 1 5

COMMENT

© 2015 Macmillan Publishers Limited. All rights reserved

Page 7: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why is this important?2014 survey

among 417 UK researchers

http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

We rely heavily on research software No software, no research

Many of us write software ourselves From single- command line scripts to multigigabyte code repositories

Page 8: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why is this important?2014 survey

among 417 UK researchers

http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

We rely heavily on research software No software, no research

Many of us write software ourselves From single- command line scripts to multigigabyte code repositories

Not all us have had formal training Most of us have learned it “on the street”

Page 9: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why is this important?

1856

NEWS>>

THIS WEEK A dolphin’s

demise

Indians wary of

nuclear pact

1860 1863

Until recently, Geoffrey Chang’s career was ona trajectory most young scientists only dreamabout. In 1999, at the age of 28, the proteincrystallographer landed a faculty position atthe prestigious Scripps Research Institute inSan Diego, California. The next year, in a cer-emony at the White House, Chang received aPresidential Early Career Awardfor Scientists and Engineers, thecountry’s highest honor for youngresearchers. His lab generated astream of high-prof ile papersdetailing the molecular structuresof important proteins embedded incell membranes.

Then the dream turned into anightmare. In September, Swissresearchers published a paper inNature that cast serious doubt on aprotein structure Chang’s grouphad described in a 2001 Science

paper. When he investigated,Chang was horrified to discoverthat a homemade data-analysis pro-gram had flipped two columns ofdata, inverting the electron-densitymap from which his team hadderived the final protein structure.Unfortunately, his group had usedthe program to analyze data forother proteins. As a result, on page 1875,Chang and his colleagues retract three Science

papers and report that two papers in other jour-nals also contain erroneous structures.

“I’ve been devastated,” Chang says. “I hopepeople will understand that it was a mistake,and I’m very sorry for it.” Other researchersdon’t doubt that the error was unintentional,and although some say it has cost them timeand effort, many praise Chang for setting therecord straight promptly and forthrightly. “I’mvery pleased he’s done this because there hasbeen some confusion” about the original struc-tures, says Christopher Higgins, a biochemistat Imperial College London. “Now the fieldcan really move forward.”

The most influential of Chang’s retractedpublications, other researchers say, was the

2001 Science paper, which described the struc-ture of a protein called MsbA, isolated from thebacterium Escherichia coli. MsbA belongs to ahuge and ancient family of molecules that useenergy from adenosine triphosphate to trans-port molecules across cell membranes. Theseso-called ABC transporters perform many

essential biological duties and are of great clin-ical interest because of their roles in drug resist-ance. Some pump antibiotics out of bacterialcells, for example; others clear chemotherapydrugs from cancer cells. Chang’s MsbA struc-ture was the first molecular portrait of an entireABC transporter, and many researchers saw itas a major contribution toward figuring out howthese crucial proteins do their jobs. That paperalone has been cited by 364 publications,according to Google Scholar.

Two subsequent papers, both now beingretracted, describe the structure of MsbA fromother bacteria, Vibrio cholera (published inMolecular Biology in 2003) and Salmonella

typhimurium (published in Science in 2005).The other retractions, a 2004 paper in theProceedings of the National Academy of

Sciences and a 2005 Science paper, describedEmrE, a different type of transporter protein.

Crystallizing and obtaining structures offive membrane proteins in just over 5 yearswas an incredible feat, says Chang’s formerpostdoc adviser Douglas Rees of the Califor-nia Institute of Technology in Pasadena. Suchproteins are a challenge for crystallographersbecause they are large, unwieldy, and notori-ously diff icult to coax into the crystalsneeded for x-ray crystallography. Rees saysdetermination was at the root of Chang’s suc-cess: “He has an incredible drive and workethic. He really pushed the field in the sense

of getting things to crystallize thatno one else had been able to do.”Chang’s data are good, Rees says,but the faulty software threweverything off.

Ironically, another former post-doc in Rees’s lab, Kaspar Locher,exposed the mistake. In the 14 Sep-tember issue of Nature, Locher,now at the Swiss Federal Instituteof Technology in Zurich, describedthe structure of an ABC transportercalled Sav1866 from Staphylococcus

aureus. The structure was dramati-cally—and unexpectedly—differ-ent from that of MsbA. Afterpulling up Sav1866 and Chang’sMsbA from S. typhimurium on acomputer screen, Locher says herealized in minutes that the MsbAstructure was inverted. Interpretingthe “hand” of a molecule is alwaysa challenge for crystallographers,

Locher notes, and many mistakes can lead toan incorrect mirror-image structure. Gettingthe wrong hand is “in the category of monu-mental blunders,” Locher says.

On reading the Nature paper, Changquickly traced the mix-up back to the analysisprogram, which he says he inherited fromanother lab. Locher suspects that Changwould have caught the mistake if he’d takenmore time to obtain a higher resolution struc-ture. “I think he was under immense pressureto get the first structure, and that’s what madehim push the limits of his data,” he says. Oth-ers suggest that Chang might have caught theproblem if he’d paid closer attention to bio-chemical findings that didn’t jibe well with theMsbA structure. “When the first structurecame out, we and others said, ‘We really

A Scientist’s Nightmare: Software

Problem Leads to Five Retractions

SCIENTIFIC PUBLISHING

CR

ED

IT: R

. J. P.

DA

WS

ON

AN

D K

. P.

LO

CH

ER

, N

AT

UR

E4

43

, 1

80

( 2

00

6)

22 DECEMBER 2006 VOL 314 SCIENCE www.sciencemag.org

Flipping fiasco. The structures of MsbA (purple) and Sav1866 (green) overlap

little (left) until MsbA is inverted (right).

Published by AAAS

on

Janu

ary

5, 2

007

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

Miller (2006) Science, 314 (5807), 1856-1857.

We rely heavily on research software No software, no research

Many of us write software ourselves From single- command line scripts to multigigabyte code repositories

Not all us have had formal training Most of us have learned it “on the street”

Poor coding practices can lead to serious problems Worst case perhaps: 1 coding error led to 5 retractions: 3 in Science, 1 in PNAS, 1 in Molecular Biology

Page 10: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why now?

Call for increased scrutiny and openness Following cases of scientific misconduct, widespread questionable research practices, and reproducibility crises

Page 11: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why now?

Ince et al (2012) Nature,482 (7386) 485-488.

PERSPECTIVEdoi:10.1038/nature10836

The case for open computer programsDarrel C. Ince1, Leslie Hatton2 & John Graham-Cumming3

Scientific communication relies on evidence that cannot be entirely included in publications, but the rise ofcomputational science has added a new layer of inaccessibility. Although it is now accepted that data should be madeavailable on request, the current regulations regarding the availability of software are inconsistent. We argue that, withsome exceptions, anything less than the release of source programs is intolerable for results that depend on computation.The vagaries of hardware, software and natural language will always ensure that exact reproducibility remainsuncertain, but withholding code increases the chances that efforts to reproduce results will fail.

T he rise of computational science has led to unprecedentedopportunities for scientific advance. Ever more powerful computersenable theories to be investigated that were thought almost

intractable a decade ago, robust hardware technologies allow data collec-tion in the most inhospitable environments, more data are collected, andan increasingly rich set of software tools are now available with which toanalyse computer-generated data.

However, there is the difficulty of reproducibility, by which we meanthe reproduction of a scientific paper’s central finding, rather than exactreplication of each specific numerical result down to several decimalplaces. We examine the problem of reproducibility (for an early attemptat solving it, see ref. 1) in the context of openly available computerprograms, or code. Our view is that we have reached the point that, withsome exceptions, anything less than release of actual source code is anindefensible approach for any scientific results that depend on computa-tion, because not releasing such code raises needless, and needlesslyconfusing, roadblocks to reproducibility.

At present, debate rages on the need to release computer programsassociated with scientific experiments2–4, with policies still ranging frommandatory total release to the release only of natural language descrip-tions, that is, written descriptions of computer program algorithms.Some journals have already changed their policies on computer programopenness; Science, for example, now includes code in the list of itemsthat should be supplied by an author5. Other journals promoting codeavailability include Geoscientific Model Development, which is devoted,at least in part, to model description and code publication, andBiostatistics, which has appointed an editor to assess the reproducibilityof the software and data associated with an article6.

In contrast, less stringent policies are exemplified by statements suchas7 ‘‘Nature does not require authors to make code available, but we doexpect a description detailed enough to allow others to write their owncode to do similar analysis.’’ Although Nature’s broader policy states that‘‘...authors are required to make materials, data and associated protocolspromptly available to readers...’’, and editors and referees are fullyempowered to demand and evaluate any specific code, we believe thatits stated policy on code availability actively hinders reproducibility.

Much of the debate about code transparency involves the philosophy ofscience, error validation and research ethics8,9, but our contention is morepractical: that the cause of reproducibility is best furthered by focusing onthe dissection and understanding of code, a sentiment already appreciatedby the growing open-source movement10. Dissection and understandingof open code would improve the chances of both direct and indirectreproducibility. Direct reproducibility refers to the recompilation and

rerunning of the code on, say, a different combination of hardware andsystems software, to detect the sort of numerical computation11,12 andinterpretation13 problems found in programming languages, which wediscuss later. Without code, direct reproducibility is impossible. Indirectreproducibility refers to independent efforts to validate something otherthan the entire code package, for example a subset of equations or a par-ticular code module. Here, before time-consuming reprogramming of anentire model, researchers may simply want to check that incorrect coding ofpreviously published equations has not invalidated a paper’s result, toextract and check detailed assumptions, or to run their own code againstthe original to check for statistical validity and explain any discrepancies.

Any debate over the difficulties of reproducibility (which, as we willshow, are non-trivial) must of course be tempered by recognizing theundeniable benefits afforded by the explosion of internet facilities and therapid increase in raw computational speed and data-handling capabilitythat has occurred as a result of major advances in computer technology14.Such advances have presented science with a great opportunity to addressproblems that would have been intractable in even the recent past. It isour view, however, that the debate over code release should be resolved assoon as possible to benefit fully from our novel technical capabilities. Ontheir own, finer computational grids, longer and more complex compu-tations and larger data sets—although highly attractive to scientificresearchers—do not resolve underlying computational uncertainties ofproven intransigence and may even exacerbate them.

Although our arguments are focused on the implications of Nature’scode statement, it is symptomatic of a wider problem: the scientificcommunity places more faith in computation than is justified. As weoutline below and in two case studies (Boxes 1 and 2), ambiguity in itsmany forms and numerical errors render natural language descriptionsinsufficient and, in many cases, unintentionally misleading.

The failure of code descriptionsThe curse of ambiguityAmbiguity in program descriptions leads to the possibility, if not thecertainty, that a given natural language description can be convertedinto computer code in various ways, each of which may lead to differentnumerical outcomes. Innumerable potential issues exist, but mightinclude mistaken order of operations, reference to different model ver-sions, or unclear calculations of uncertainties. The problem of ambiguityhas haunted software development from its earliest days.

Ambiguity can occur at the lexical, syntactic or semantic level15 and isnot necessarily the result of incompetence or bad practice. It is a naturalconsequence of using natural language16 and is unavoidable. The

1Department of Computing Open University, Walton Hall, Milton Keynes MK7 6AA, UK. 2School of Computing and Information Systems, Kingston University, Kingston KT1 2EE, UK. 383 Victoria Street,London SW1H 0HW, UK.

2 3 F E B R U A R Y 2 0 1 2 | V O L 4 8 2 | N A T U R E | 4 8 5

Macmillan Publishers Limited. All rights reserved©2012

Call for increased scrutiny and openness Following cases of scientific misconduct, widespread questionable research practices, and reproducibility crises

Page 12: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why now?

Peng (2011) Science, 334 (6060) 1226-1227

PERSPECTIVE

Reproducible Research inComputational ScienceRoger D. Peng

Computational science has led to exciting new developments, but the nature of the work hasexposed limitations in our ability to evaluate published findings. Reproducibility has the potentialto serve as a minimum standard for judging scientific claims when full independent replication of astudy is not possible.

The rise of computational science has led toexciting and fast-moving developments inmany scientific areas. New technologies,

increased computing power, and methodologicaladvances have dramatically improved our abilityto collect complex high-dimensional data (1, 2).Large data sets have led to scientists doing morecomputation, as well as researchers in computa-tionally oriented fields directly engaging in morescience. The availability of large public databaseshas allowed for researchers to make meaningfulscientific contributions without using the tradi-tional tools of a given field. As an example ofthis overall trend, the Sloan Digital Sky Survey,a large publicly available astronomical surveyof the Northern Hemisphere, was ranked the mostcited observatory (3), allowing astronomers with-out telescopes to make discoveries using data col-lected by others. Similar developments can befound in fields such as biology and epidemiology.

Replication is the ultimate standard by whichscientific claims are judged. With replication,independent investigators address ascientific hypothesis and build upevidence for or against it. The scien-tific community’s “culture of replica-tion” has served to quickly weed outspurious claims and enforce on thecommunity a disciplined approach toscientific discovery. However, withcomputational science and the corre-sponding collection of large and com-plex data sets the notion of replicationcan be murkier. It would require tre-mendous resources to independentlyreplicate the Sloan Digital Sky Sur-vey. Many studies—for example, inclimate science—require computing power thatmay not be available to all researchers. Even ifcomputing and data size are not limiting factors,replication can be difficult for other reasons. Inenvironmental epidemiology, large cohort studiesdesigned to examine subtle health effects of en-vironmental pollutants can be very expensive and

require long follow-up times. Such studies aredifficult to replicate because of time and expense,especially in the time frame of policy decisionsthat need to be made regarding regulation (2).

Researchers across a range of computationalscience disciplines have been calling for repro-ducibility, or reproducible research, as an attain-able minimum standard for assessing the value ofscientific claims, particularly when full independentreplication of a study is not feasible (4–8). Thestandard of reproducibility calls for the data andthe computer code used to analyze the data bemadeavailable to others. This standard falls short of fullreplication because the same data are analyzedagain, rather than analyzing independently col-lected data. However, under this standard, limitedexploration of the data and the analysis code ispossible andmay be sufficient to verify the qualityof the scientific claims. One aim of the reproduc-ibility standard is to fill the gap in the scientificevidence-generating process between full repli-cation of a study and no replication. Between

these two extreme end points, there is a spectrumof possibilities, and a study may be more or lessreproducible than another depending onwhat dataand code are made available (Fig. 1). A recentreview of microarray gene expression analysesfound that studies were either not reproducible,partially reproducible with some discrepancies, orreproducible. This range was largely explained bythe availability of data and metadata (9).

The reproducibility standard is based on thefact that every computational experiment has, intheory, a detailed log of every action taken by the

computer. Making these computer codes avail-able to others provides a level of detail regardingthe analysis that is greater than the analagous non-computational experimental descriptions printedin journals using a natural language.

A critical barrier to reproducibility in manycases is that the computer code is no longer avail-able. Interactive software systems often used forexploratory data analysis typically do not keeptrack of users’ actions in any concrete form. Evenif researchers use software that is run by writtencode, often multiple packages are used, and thecode that combines the different results togetheris not saved (10). Addressing this problem willrequire either changing the behavior of the soft-ware systems themselves or getting researchersto use other software systems that are more ame-nable to reproducibility. Neither is likely to hap-pen quickly; old habits die hard, and many willbe unwilling to discard the hours spent learningexisting systems. Non–open source software canonly be changed by their owners, who may notperceive reproducibility as a high priority.

In order to advance reproducibility in com-putational science, contributions will need tocome from multiple directions. Journals can playa role here as part of a joint effort by the scientificcommunity. The journal Biostatistics, for whichI am an associate editor, has implemented a pol-icy for encouraging authors of accepted papersto make their work reproducible by others (11).Authors can submit their code or data to thejournal for posting as supporting online materialand can additionally request a “reproducibilityreview,” in which the associate editor for repro-ducibility runs the submitted code on the data

and verifies that the code produces the resultspublished in the article. Articles with data orcode receive a “D” or “C” kite-mark, respec-tively, printed on the first page of the articleitself. Articles that have passed the reproduc-ibility review receive an “R.” The policy was im-plemented in July 2009, and as of July 2011,21 of 125 articles have been published with akite-mark, including five articles with an “R.”The articles have reflected a range of topicsfrom biostatistical methods, epidemiology, andgenomics. In this admittedly small sample, we

Data Replication & Reproducibility

Department of Biostatistics, Johns Hopkins BloombergSchool of Public Health, Baltimore MD 21205, USA.

To whom correspondence should be addressed. E-mail:[email protected]

Reproducibility Spectrum

Not reproducible Gold standard

Full replication

Publication only

Publication +

Code Code and data

Linked and executable

code and data

Fig. 1. The spectrum of reproducibility.

2 DECEMBER 2011 VOL 334 SCIENCE www.sciencemag.org1226

on

Mar

ch 2

3, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

on

Mar

ch 2

3, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

Call for increased scrutiny and openness Following cases of scientific misconduct, widespread questionable research practices, and reproducibility crises

Page 13: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why now?INSIGHTS

|

PERSPECTIVES

1422 26 JUNE 2015 • VOL 348 ISSUE 6242 sciencemag.org SCIENCE

and neutral resource that supports and

complements efforts of the research enter-

prise and its key stakeholders.

Universities should insist that their fac-

ulties and students are schooled in the eth-

ics of research, their publications feature

neither honorific nor ghost authors, their

public information offices avoid hype in

publicizing findings, and suspect research

is promptly and thoroughly investigated.

All researchers need to realize that the

best scientific practice is produced when,

like Darwin, they persistently search for

flaws in their arguments. Because inherent

variability in biological systems makes it

possible for researchers to explore differ-

ent sets of conditions until the expected

(and rewarded) result is obtained, the need

for vigilant self-critique may be especially

great in research with direct application to

human disease. We encourage each branch

of science to invest in case studies identify-

ing what went wrong in a selected subset

of nonreproducible publications—enlisting

social scientists and experts in the respec-

tive fields to interview those who were

involved (and perhaps examining lab note-

books or redoing statistical analyses), with

the hope of deriving general principles for

improving science in each field.

Industry should publish its failed efforts

to reproduce scientific findings and join

scientists in the academy in making the

case for the importance of scientific work.

Scientific associations should continue to

communicate science as a way of know-

ing, and educate their members in ways to

more effectively communicate key scien-

tific findings to broader publics. Journals

should continue to ask for higher stan-

dards of transparency and reproducibility.

We recognize that incentives can backfire.

Still, because those such as enhanced social

image and forms of public recognition ( 10,

11) can increase productive social behavior

( 12), we believe that replacing the stigma of

retraction with language that lauds report-

ing of unintended errors in a publication will

increase that behavior. Because sustaining a

good reputation can incentivize cooperative

behavior ( 13), we anticipate that our pro-

posed changes in the review process will not

only increase the quality of the final product

but also expose efforts to sabotage indepen-

dent review. To ensure that such incentives

not only advance our objectives but above

all do no harm, we urge that each be scru-

tinized and evaluated before being broadly

implemented.

Will past be prologue? If science is to

enhance its capacities to improve our un-

derstanding of ourselves and our world,

protect the hard-earned trust and esteem

in which society holds it, and preserve its

role as a driver of our economy, scientists

must safeguard its rigor and reliability in

the face of challenges posed by a research

ecosystem that is evolving in dramatic and

sometimes unsettling ways. To do this, the

scientific research community needs to be

involved in an ongoing dialogue. We hope

that this essay and the report The Integrity

of Science ( 14), forthcoming in 2015, will

serve as catalysts for such a dialogue.

Asked at the close of the U.S. Consti-

tutional Convention of 1787 whether the

deliberations had produced a republic or

a monarchy, Benjamin Franklin said “A

Republic, if you can keep it.” Just as pre-

serving a system of government requires

ongoing dedication and vigilance, so too

does protecting the integrity of science. ■

REFERENCES AND NOTES

1. Trouble at the lab, The Economist, 19 October 2013; www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble.

2. R. Merton, The Sociology of Science: Theoretical and Empirical Investigations (University of Chicago Press, Chicago, 1973), p. 276.

3. K. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge (Routledge, London, 1963), p. 293.

4. Editorial Board, Nature 511, 5 (2014); www.nature.com/news/stap-retracted-1.15488.

5. B. A. Nosek et al., Science 348, 1422 (2015). 6. Institute of Medicine, Discussion Framework for Clinical

Trial Data Sharing: Guiding Principles, Elements, and Activities (National Academies Press, Washington, DC, 2014).

7. B. Nosek, J. Spies, M. Motyl, Perspect. Psychol. Sci. 7, 615 (2012).

8. C. Franzoni, G. Scellato, P. Stephan, Science 333, 702 (2011).

9. National Academy of Sciences, National Academy of Engineering, and Institute of Medicine, Responsible Science, Volume I: Ensuring the Integrity of the Research Process (National Academies Press, Washington, DC, 1992).

10. N. Lacetera, M. Macis, J. Econ. Behav. Organ. 76, 225 (2010).

11. D. Karlan, M. McConnell, J. Econ. Behav. Organ. 106, 402 (2014).

12. R. Thaler, C. Sunstein, Nudge: Improving Decisions About Health, Wealth and Happiness (Yale Univ. Press, New Haven, CT, 2009).

13. T. Pfeiffer, L. Tran, C. Krumme, D. Rand, J. R. Soc. Interface 2012, rsif20120332 (2012).

14. Committee on Science, Engineering, and Public Policy of the National Academy of Sciences, National Academy of Engineering, and Institute of Medicine, The Integrity of Science (National Academies Press, forthcoming). http://www8.nationalacademies.org/cp/projectview.aspx?key=49387.

10.1126/science.aab3847

“Instances in which

scientists detect and

address flaws in work

constitute evidence

of success, not failure.”

Transparency, openness, and repro-

ducibility are readily recognized as

vital features of science ( 1, 2). When

asked, most scientists embrace these

features as disciplinary norms and

values ( 3). Therefore, one might ex-

pect that these valued features would be

routine in daily practice. Yet, a growing

body of evidence suggests that this is not

the case ( 4– 6).

A likely culprit for this disconnect is an

academic reward system that does not suf-

ficiently incentivize open practices ( 7). In the

present reward system, emphasis on innova-

tion may undermine practices

that support verification. Too

often, publication requirements

(whether actual or perceived) fail to encour-

age transparent, open, and reproducible sci-

ence ( 2, 4, 8, 9). For example, in a transparent

science, both null results and statistically

significant results are made available and

help others more accurately assess the evi-

dence base for a phenomenon. In the present

culture, however, null results are published

less frequently than statistically significant

results ( 10) and are, therefore, more likely

inaccessible and lost in the “file drawer” ( 11).

The situation is a classic collective action

problem. Many individual researchers lack

Promoting an

open research

culture

By B. A. Nosek ,* G. Alter, G. C. Banks,

D. Borsboom, S. D. Bowman,

S. J. Breckler, S. Buck, C. D. Chambers,

G. Chin, G. Christensen, M. Contestabile,

A. Dafoe, E. Eich, J. Freese,

R. Glennerster, D. Goroff, D. P. Green, B.

Hesse, M. Humphreys, J. Ishiyama,

D. Karlan, A. Kraut, A. Lupia, P. Mabry,

T. Madon, N. Malhotra,

E. Mayo-Wilson, M. McNutt, E. Miguel,

E. Levy Paluck, U. Simonsohn,

C. Soderberg, B. A. Spellman,

J. Turitto, G. VandenBos, S. Vazire,

E. J. Wagenmakers, R. Wilson, T. Yarkoni

Author guidelines for

journals could help to

promote transparency,

openness, and

reproducibility

SCIENTIFIC STANDARDS

POLICY

DA_0626PerspectivesR1.indd 1422 7/24/15 10:22 AM

Published by AAAS

on

Sept

embe

r 17,

201

5w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

o

n Se

ptem

ber 1

7, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

on

Sept

embe

r 17,

201

5w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

o

n Se

ptem

ber 1

7, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

PERSPECTIVEdoi:10.1038/nature10836

The case for open computer programsDarrel C. Ince1, Leslie Hatton2 & John Graham-Cumming3

Scientific communication relies on evidence that cannot be entirely included in publications, but the rise ofcomputational science has added a new layer of inaccessibility. Although it is now accepted that data should be madeavailable on request, the current regulations regarding the availability of software are inconsistent. We argue that, withsome exceptions, anything less than the release of source programs is intolerable for results that depend on computation.The vagaries of hardware, software and natural language will always ensure that exact reproducibility remainsuncertain, but withholding code increases the chances that efforts to reproduce results will fail.

T he rise of computational science has led to unprecedentedopportunities for scientific advance. Ever more powerful computersenable theories to be investigated that were thought almost

intractable a decade ago, robust hardware technologies allow data collec-tion in the most inhospitable environments, more data are collected, andan increasingly rich set of software tools are now available with which toanalyse computer-generated data.

However, there is the difficulty of reproducibility, by which we meanthe reproduction of a scientific paper’s central finding, rather than exactreplication of each specific numerical result down to several decimalplaces. We examine the problem of reproducibility (for an early attemptat solving it, see ref. 1) in the context of openly available computerprograms, or code. Our view is that we have reached the point that, withsome exceptions, anything less than release of actual source code is anindefensible approach for any scientific results that depend on computa-tion, because not releasing such code raises needless, and needlesslyconfusing, roadblocks to reproducibility.

At present, debate rages on the need to release computer programsassociated with scientific experiments2–4, with policies still ranging frommandatory total release to the release only of natural language descrip-tions, that is, written descriptions of computer program algorithms.Some journals have already changed their policies on computer programopenness; Science, for example, now includes code in the list of itemsthat should be supplied by an author5. Other journals promoting codeavailability include Geoscientific Model Development, which is devoted,at least in part, to model description and code publication, andBiostatistics, which has appointed an editor to assess the reproducibilityof the software and data associated with an article6.

In contrast, less stringent policies are exemplified by statements suchas7 ‘‘Nature does not require authors to make code available, but we doexpect a description detailed enough to allow others to write their owncode to do similar analysis.’’ Although Nature’s broader policy states that‘‘...authors are required to make materials, data and associated protocolspromptly available to readers...’’, and editors and referees are fullyempowered to demand and evaluate any specific code, we believe thatits stated policy on code availability actively hinders reproducibility.

Much of the debate about code transparency involves the philosophy ofscience, error validation and research ethics8,9, but our contention is morepractical: that the cause of reproducibility is best furthered by focusing onthe dissection and understanding of code, a sentiment already appreciatedby the growing open-source movement10. Dissection and understandingof open code would improve the chances of both direct and indirectreproducibility. Direct reproducibility refers to the recompilation and

rerunning of the code on, say, a different combination of hardware andsystems software, to detect the sort of numerical computation11,12 andinterpretation13 problems found in programming languages, which wediscuss later. Without code, direct reproducibility is impossible. Indirectreproducibility refers to independent efforts to validate something otherthan the entire code package, for example a subset of equations or a par-ticular code module. Here, before time-consuming reprogramming of anentire model, researchers may simply want to check that incorrect coding ofpreviously published equations has not invalidated a paper’s result, toextract and check detailed assumptions, or to run their own code againstthe original to check for statistical validity and explain any discrepancies.

Any debate over the difficulties of reproducibility (which, as we willshow, are non-trivial) must of course be tempered by recognizing theundeniable benefits afforded by the explosion of internet facilities and therapid increase in raw computational speed and data-handling capabilitythat has occurred as a result of major advances in computer technology14.Such advances have presented science with a great opportunity to addressproblems that would have been intractable in even the recent past. It isour view, however, that the debate over code release should be resolved assoon as possible to benefit fully from our novel technical capabilities. Ontheir own, finer computational grids, longer and more complex compu-tations and larger data sets—although highly attractive to scientificresearchers—do not resolve underlying computational uncertainties ofproven intransigence and may even exacerbate them.

Although our arguments are focused on the implications of Nature’scode statement, it is symptomatic of a wider problem: the scientificcommunity places more faith in computation than is justified. As weoutline below and in two case studies (Boxes 1 and 2), ambiguity in itsmany forms and numerical errors render natural language descriptionsinsufficient and, in many cases, unintentionally misleading.

The failure of code descriptionsThe curse of ambiguityAmbiguity in program descriptions leads to the possibility, if not thecertainty, that a given natural language description can be convertedinto computer code in various ways, each of which may lead to differentnumerical outcomes. Innumerable potential issues exist, but mightinclude mistaken order of operations, reference to different model ver-sions, or unclear calculations of uncertainties. The problem of ambiguityhas haunted software development from its earliest days.

Ambiguity can occur at the lexical, syntactic or semantic level15 and isnot necessarily the result of incompetence or bad practice. It is a naturalconsequence of using natural language16 and is unavoidable. The

1Department of Computing Open University, Walton Hall, Milton Keynes MK7 6AA, UK. 2School of Computing and Information Systems, Kingston University, Kingston KT1 2EE, UK. 383 Victoria Street,London SW1H 0HW, UK.

2 3 F E B R U A R Y 2 0 1 2 | V O L 4 8 2 | N A T U R E | 4 8 5

Macmillan Publishers Limited. All rights reserved©2012

Nosek et al. (2015) Science, 348 (6242) 1422-1425.

Call for increased scrutiny and openness Following cases of scientific misconduct, widespread questionable research practices, and reproducibility crises

Page 14: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why now?INSIGHTS

|

PERSPECTIVES

1422 26 JUNE 2015 • VOL 348 ISSUE 6242 sciencemag.org SCIENCE

and neutral resource that supports and

complements efforts of the research enter-

prise and its key stakeholders.

Universities should insist that their fac-

ulties and students are schooled in the eth-

ics of research, their publications feature

neither honorific nor ghost authors, their

public information offices avoid hype in

publicizing findings, and suspect research

is promptly and thoroughly investigated.

All researchers need to realize that the

best scientific practice is produced when,

like Darwin, they persistently search for

flaws in their arguments. Because inherent

variability in biological systems makes it

possible for researchers to explore differ-

ent sets of conditions until the expected

(and rewarded) result is obtained, the need

for vigilant self-critique may be especially

great in research with direct application to

human disease. We encourage each branch

of science to invest in case studies identify-

ing what went wrong in a selected subset

of nonreproducible publications—enlisting

social scientists and experts in the respec-

tive fields to interview those who were

involved (and perhaps examining lab note-

books or redoing statistical analyses), with

the hope of deriving general principles for

improving science in each field.

Industry should publish its failed efforts

to reproduce scientific findings and join

scientists in the academy in making the

case for the importance of scientific work.

Scientific associations should continue to

communicate science as a way of know-

ing, and educate their members in ways to

more effectively communicate key scien-

tific findings to broader publics. Journals

should continue to ask for higher stan-

dards of transparency and reproducibility.

We recognize that incentives can backfire.

Still, because those such as enhanced social

image and forms of public recognition ( 10,

11) can increase productive social behavior

( 12), we believe that replacing the stigma of

retraction with language that lauds report-

ing of unintended errors in a publication will

increase that behavior. Because sustaining a

good reputation can incentivize cooperative

behavior ( 13), we anticipate that our pro-

posed changes in the review process will not

only increase the quality of the final product

but also expose efforts to sabotage indepen-

dent review. To ensure that such incentives

not only advance our objectives but above

all do no harm, we urge that each be scru-

tinized and evaluated before being broadly

implemented.

Will past be prologue? If science is to

enhance its capacities to improve our un-

derstanding of ourselves and our world,

protect the hard-earned trust and esteem

in which society holds it, and preserve its

role as a driver of our economy, scientists

must safeguard its rigor and reliability in

the face of challenges posed by a research

ecosystem that is evolving in dramatic and

sometimes unsettling ways. To do this, the

scientific research community needs to be

involved in an ongoing dialogue. We hope

that this essay and the report The Integrity

of Science ( 14), forthcoming in 2015, will

serve as catalysts for such a dialogue.

Asked at the close of the U.S. Consti-

tutional Convention of 1787 whether the

deliberations had produced a republic or

a monarchy, Benjamin Franklin said “A

Republic, if you can keep it.” Just as pre-

serving a system of government requires

ongoing dedication and vigilance, so too

does protecting the integrity of science. ■

REFERENCES AND NOTES

1. Trouble at the lab, The Economist, 19 October 2013; www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble.

2. R. Merton, The Sociology of Science: Theoretical and Empirical Investigations (University of Chicago Press, Chicago, 1973), p. 276.

3. K. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge (Routledge, London, 1963), p. 293.

4. Editorial Board, Nature 511, 5 (2014); www.nature.com/news/stap-retracted-1.15488.

5. B. A. Nosek et al., Science 348, 1422 (2015). 6. Institute of Medicine, Discussion Framework for Clinical

Trial Data Sharing: Guiding Principles, Elements, and Activities (National Academies Press, Washington, DC, 2014).

7. B. Nosek, J. Spies, M. Motyl, Perspect. Psychol. Sci. 7, 615 (2012).

8. C. Franzoni, G. Scellato, P. Stephan, Science 333, 702 (2011).

9. National Academy of Sciences, National Academy of Engineering, and Institute of Medicine, Responsible Science, Volume I: Ensuring the Integrity of the Research Process (National Academies Press, Washington, DC, 1992).

10. N. Lacetera, M. Macis, J. Econ. Behav. Organ. 76, 225 (2010).

11. D. Karlan, M. McConnell, J. Econ. Behav. Organ. 106, 402 (2014).

12. R. Thaler, C. Sunstein, Nudge: Improving Decisions About Health, Wealth and Happiness (Yale Univ. Press, New Haven, CT, 2009).

13. T. Pfeiffer, L. Tran, C. Krumme, D. Rand, J. R. Soc. Interface 2012, rsif20120332 (2012).

14. Committee on Science, Engineering, and Public Policy of the National Academy of Sciences, National Academy of Engineering, and Institute of Medicine, The Integrity of Science (National Academies Press, forthcoming). http://www8.nationalacademies.org/cp/projectview.aspx?key=49387.

10.1126/science.aab3847

“Instances in which

scientists detect and

address flaws in work

constitute evidence

of success, not failure.”

Transparency, openness, and repro-

ducibility are readily recognized as

vital features of science ( 1, 2). When

asked, most scientists embrace these

features as disciplinary norms and

values ( 3). Therefore, one might ex-

pect that these valued features would be

routine in daily practice. Yet, a growing

body of evidence suggests that this is not

the case ( 4– 6).

A likely culprit for this disconnect is an

academic reward system that does not suf-

ficiently incentivize open practices ( 7). In the

present reward system, emphasis on innova-

tion may undermine practices

that support verification. Too

often, publication requirements

(whether actual or perceived) fail to encour-

age transparent, open, and reproducible sci-

ence ( 2, 4, 8, 9). For example, in a transparent

science, both null results and statistically

significant results are made available and

help others more accurately assess the evi-

dence base for a phenomenon. In the present

culture, however, null results are published

less frequently than statistically significant

results ( 10) and are, therefore, more likely

inaccessible and lost in the “file drawer” ( 11).

The situation is a classic collective action

problem. Many individual researchers lack

Promoting an

open research

culture

By B. A. Nosek ,* G. Alter, G. C. Banks,

D. Borsboom, S. D. Bowman,

S. J. Breckler, S. Buck, C. D. Chambers,

G. Chin, G. Christensen, M. Contestabile,

A. Dafoe, E. Eich, J. Freese,

R. Glennerster, D. Goroff, D. P. Green, B.

Hesse, M. Humphreys, J. Ishiyama,

D. Karlan, A. Kraut, A. Lupia, P. Mabry,

T. Madon, N. Malhotra,

E. Mayo-Wilson, M. McNutt, E. Miguel,

E. Levy Paluck, U. Simonsohn,

C. Soderberg, B. A. Spellman,

J. Turitto, G. VandenBos, S. Vazire,

E. J. Wagenmakers, R. Wilson, T. Yarkoni

Author guidelines for

journals could help to

promote transparency,

openness, and

reproducibility

SCIENTIFIC STANDARDS

POLICY

DA_0626PerspectivesR1.indd 1422 7/24/15 10:22 AM

Published by AAAS

on

Sept

embe

r 17,

201

5w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

o

n Se

ptem

ber 1

7, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

on

Sept

embe

r 17,

201

5w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

o

n Se

ptem

ber 1

7, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

PERSPECTIVEdoi:10.1038/nature10836

The case for open computer programsDarrel C. Ince1, Leslie Hatton2 & John Graham-Cumming3

Scientific communication relies on evidence that cannot be entirely included in publications, but the rise ofcomputational science has added a new layer of inaccessibility. Although it is now accepted that data should be madeavailable on request, the current regulations regarding the availability of software are inconsistent. We argue that, withsome exceptions, anything less than the release of source programs is intolerable for results that depend on computation.The vagaries of hardware, software and natural language will always ensure that exact reproducibility remainsuncertain, but withholding code increases the chances that efforts to reproduce results will fail.

T he rise of computational science has led to unprecedentedopportunities for scientific advance. Ever more powerful computersenable theories to be investigated that were thought almost

intractable a decade ago, robust hardware technologies allow data collec-tion in the most inhospitable environments, more data are collected, andan increasingly rich set of software tools are now available with which toanalyse computer-generated data.

However, there is the difficulty of reproducibility, by which we meanthe reproduction of a scientific paper’s central finding, rather than exactreplication of each specific numerical result down to several decimalplaces. We examine the problem of reproducibility (for an early attemptat solving it, see ref. 1) in the context of openly available computerprograms, or code. Our view is that we have reached the point that, withsome exceptions, anything less than release of actual source code is anindefensible approach for any scientific results that depend on computa-tion, because not releasing such code raises needless, and needlesslyconfusing, roadblocks to reproducibility.

At present, debate rages on the need to release computer programsassociated with scientific experiments2–4, with policies still ranging frommandatory total release to the release only of natural language descrip-tions, that is, written descriptions of computer program algorithms.Some journals have already changed their policies on computer programopenness; Science, for example, now includes code in the list of itemsthat should be supplied by an author5. Other journals promoting codeavailability include Geoscientific Model Development, which is devoted,at least in part, to model description and code publication, andBiostatistics, which has appointed an editor to assess the reproducibilityof the software and data associated with an article6.

In contrast, less stringent policies are exemplified by statements suchas7 ‘‘Nature does not require authors to make code available, but we doexpect a description detailed enough to allow others to write their owncode to do similar analysis.’’ Although Nature’s broader policy states that‘‘...authors are required to make materials, data and associated protocolspromptly available to readers...’’, and editors and referees are fullyempowered to demand and evaluate any specific code, we believe thatits stated policy on code availability actively hinders reproducibility.

Much of the debate about code transparency involves the philosophy ofscience, error validation and research ethics8,9, but our contention is morepractical: that the cause of reproducibility is best furthered by focusing onthe dissection and understanding of code, a sentiment already appreciatedby the growing open-source movement10. Dissection and understandingof open code would improve the chances of both direct and indirectreproducibility. Direct reproducibility refers to the recompilation and

rerunning of the code on, say, a different combination of hardware andsystems software, to detect the sort of numerical computation11,12 andinterpretation13 problems found in programming languages, which wediscuss later. Without code, direct reproducibility is impossible. Indirectreproducibility refers to independent efforts to validate something otherthan the entire code package, for example a subset of equations or a par-ticular code module. Here, before time-consuming reprogramming of anentire model, researchers may simply want to check that incorrect coding ofpreviously published equations has not invalidated a paper’s result, toextract and check detailed assumptions, or to run their own code againstthe original to check for statistical validity and explain any discrepancies.

Any debate over the difficulties of reproducibility (which, as we willshow, are non-trivial) must of course be tempered by recognizing theundeniable benefits afforded by the explosion of internet facilities and therapid increase in raw computational speed and data-handling capabilitythat has occurred as a result of major advances in computer technology14.Such advances have presented science with a great opportunity to addressproblems that would have been intractable in even the recent past. It isour view, however, that the debate over code release should be resolved assoon as possible to benefit fully from our novel technical capabilities. Ontheir own, finer computational grids, longer and more complex compu-tations and larger data sets—although highly attractive to scientificresearchers—do not resolve underlying computational uncertainties ofproven intransigence and may even exacerbate them.

Although our arguments are focused on the implications of Nature’scode statement, it is symptomatic of a wider problem: the scientificcommunity places more faith in computation than is justified. As weoutline below and in two case studies (Boxes 1 and 2), ambiguity in itsmany forms and numerical errors render natural language descriptionsinsufficient and, in many cases, unintentionally misleading.

The failure of code descriptionsThe curse of ambiguityAmbiguity in program descriptions leads to the possibility, if not thecertainty, that a given natural language description can be convertedinto computer code in various ways, each of which may lead to differentnumerical outcomes. Innumerable potential issues exist, but mightinclude mistaken order of operations, reference to different model ver-sions, or unclear calculations of uncertainties. The problem of ambiguityhas haunted software development from its earliest days.

Ambiguity can occur at the lexical, syntactic or semantic level15 and isnot necessarily the result of incompetence or bad practice. It is a naturalconsequence of using natural language16 and is unavoidable. The

1Department of Computing Open University, Walton Hall, Milton Keynes MK7 6AA, UK. 2School of Computing and Information Systems, Kingston University, Kingston KT1 2EE, UK. 383 Victoria Street,London SW1H 0HW, UK.

2 3 F E B R U A R Y 2 0 1 2 | V O L 4 8 2 | N A T U R E | 4 8 5

Macmillan Publishers Limited. All rights reserved©2012

Nosek et al. (2015) Science, 348 (6242) 1422-1425.

Call for increased scrutiny and openness Following cases of scientific misconduct, widespread questionable research practices, and reproducibility crises

Page 15: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why now?

Call for increased scrutiny and openness Following cases of scientific misconduct, widespread questionable research practices, and reproducibility crises

Page 16: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Why now?

Call for increased scrutiny and openness Following cases of scientific misconduct, widespread questionable research practices, and reproducibility crises

Netherlands Organisation for Scientific Research

Open where possible, protected where neededNWO and Open Access

Guidelines on Open Access to Scientific Publications and Research Data

in Horizon 2020

Version 1.0 11 December 2013

http://www.nwo.nl/binaries/content/assets/nwo/documents/nwo/open-science/open-access-flyer-2012-def-eng-scherm.pdf http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf

Page 17: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

What should we (try to) do?

1. Write programs for peopleso you and others quickly understand the code

2. Let the computer do the work to increase speed and reproducibility

3. Make incremental changes to increase productivity and reproducibility

4. Don’t repeat yourself (or others) repetitions are vulnerable to errors and a waste of time

5. Plan for mistakes almost all code has bugs, but not all bugs cause errors

6. Optimize software only after it works and perhaps only if it is worth the effort

7. Document design and purpose, not implementationto clarify understanding and usage

8. Collaborateto maximize reproducibility, clarity, and learning

Page 18: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

1. Write programs for people, not computers… so you and others quickly understand the code

A program should not require its readers to hold more than a handful of facts in memory at once Break up programs in easily understood functions, each of which performs an easily understood task

Page 19: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

1. Write programs for people, not computers… so you and others quickly understand the code

SCRIPT X

Statements … … …

SCRIPT X

Call function a

Call function b

Call function c Statements … … …

Statements … … …

Page 20: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

1. Write programs for people, not computers… so you and others quickly understand the code

A program should not require its readers to hold more than a handful of facts in memory at once Break up programs in easily understood functions, each of which performs an easily understood task

Make names consistent, distinctive, and meaningful e.g. fileName, meanRT rather than, tmp, result e.g. camelCase, pothole_case, UPPERCASE

Page 21: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

http://nl.mathworks.com/matlabcentral/fileexchange/46056-matlab-style-guidelines-2-0 http://nl.mathworks.com/matlabcentral/fileexchange/48121-matlab-coding-checklist

MATLAB Guidelines 2.0

Richard Johnson

A program should not require its readers to hold more than a handful of facts in memory at once Break up programs in easily understood functions, each of which performs an easily understood task

Make names consistent, distinctive, and meaningful e.g. fileName, meanRT rather than, tmp, result e.g. camelCase, pothole_case, UPPERCASE

Make code style and formatting consistent Use existing guidelines and check lists

MATLAB® @Work

[email protected] www.datatool.com

CODING CHECKLIST GENERAL

DESIGN

Do you have a good design?

Is the code traceable to requirements?

Does the code support automated use and testing?

Does the code have documented test cases?

Does the code adhere to your designated coding style and standard?

Is the code free from Code Analyzer messages?

Does each function perform one well-defined task?

Is each class cohesive?

Have you refactored out any blocks of code repeated unnecessarily?

Are there unnecessary global constants or variables?

Does the code have appropriate generality?

Does the code make naïve assumptions about the data?

Does the code use appropriate error checking and handling?

If performance is an issue, have you profiled the code?

UNDERSTANDABILITY

Is the code straightforward and does it avoid “cleverness”?

Has tricky code been rewritten rather than commented?

Do you thoroughly understand your code? Will anyone else?

Is it easy to follow and understand?

CODE CHANGES

Have the changes been reviewed as thoroughly as initial development would be?

Do the changes enhance the program’s internal quality rather than degrading it?

Have you improved the system’s modularity by breaking functions into smaller functions?

Have you improved the programming style--names, formatting, comments?

Have you introduced any flaws or limitations?

Page 22: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

2. Let the computer do the work… to increase speed and reproducibility

Make the computer repeat tasks Go beyond the point-and-click approach

Page 23: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

2. Let the computer do the work… to increase speed and reproducibility

Make the computer repeat tasks Go beyond the point-and-click approach

Save recent commands in a file for re-use Log session’s activity to file In MATLAB: diary

Page 24: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

2. Let the computer do the work… to increase speed and reproducibility

Make the computer repeat tasks Go beyond the point-and-click approach

Save recent commands in a file for re-use Log session’s activity to file In MATLAB: diary

Use a build tool to automate workflows e.g. SPM’s matlabbatch, IPython/Jupyter notebook, SPSS syntax

SPM’s matlabbatch

Page 25: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

2. Let the computer do the work

http://jupyter.org/

Page 26: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

3. Make incremental changes

Work in small steps with frequent feedback and course correction Agile development: add/change one thing at a time, aim for working code at completion

… to increase productivity and reproducibility

Page 27: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

3. Make incremental changes… to increase productivity and reproducibility

Work in small steps with frequent feedback and course correction Agile development: add/change one thing at a time, aim for working code at completion

Use a version control system Documents who made what contributions when Provides access to entire history of code + metadata Many free options to choose from: e.g. git, svn

http://www.phdcomics.com/comics/archive.php?comicid=1531

Page 28: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

http://swcarpentry.github.io/git-novice/01-basics.html

Different versions can be saved Multiple versions can be merged

Changes are saved sequentially (“snapshots”)

Page 29: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

3. Make incremental changes

http://www.phdcomics.com/comics/archive.php?comicid=1531

Work in small steps with frequent feedback and course correction Agile development: add/change one thing at a time, aim for working code at completion

Use a version control system Documents who made what contributions when Provides access to entire history of code + metadata Many free options to choose from: e.g. git, svn

Put everything that has been created manually in version control Advantages of version control also apply to manuscripts, figures, presentations

… to increase productivity and reproducibility

Page 30: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

https://github.com/smathot/materials_for_P0001/commits/master/manuscript

Page 31: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

4. Don’t repeat yourself (or others)

Every piece of data must have a single authoritative representation in the system Raw data should have a single canonical version

… repetitions are vulnerable to errors and a waste of time

http://swcarpentry.github.io/v4/data/mgmt.pdf

Page 32: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

4. Don’t repeat yourself (or others)… repetitions are vulnerable to errors and a waste of time

http://swcarpentry.github.io/v4/data/mgmt.pdf

Page 33: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

4. Don’t repeat yourself (or others)… repetitions are vulnerable to errors and a waste of time

Every piece of data must have a single authoritative representation in the system Raw data should have a single canonical version

Modularize code rather than copying and pasting Avoid “code clones”, they make it difficult to change code as you have to find and update multiple fragments

Page 34: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

4. Don’t repeat yourself (or others)… repetitions are vulnerable to errors and a waste of time

SCRIPT X

Call function a

Call function b

SCRIPT Y

Call function a

Call function c

SCRIPT Y

Statements ….

SCRIPT X

Statements ….

Statements ….

Statements ….

code clone

Page 35: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

4. Don’t repeat yourself (or others)… repetitions are vulnerable to errors and a waste of time

Every piece of data must have a single authoritative representation in the system Raw data should have a single canonical version

Modularize code rather than copying and pasting Avoid “code clones”, they make it difficult to change code as you have to find and update multiple fragments

Re-use code instead of rewriting it Chances are somebody else made it: e.g. look at SPM Extensions (~200), NITRC (810), CRAN (7395), MATLAB File Exchange (25297), PyPI (68247)

Page 36: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

5. Plan for mistakes

Add assertions to programs to check their operation Assertions are internal self-checks that verify whether inputs, outputs, etc. are line with the programmer’s expectations In MATLAB: assert

… almost all code has bugs, but not all bugs cause errors

Page 37: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

5. Plan for mistakes

Add assertions to programs to check their operation Assertions are internal self-checks that verify whether inputs, outputs, etc. are line with the programmer’s expectations In MATLAB: assert

Use an off-the-shelf unit testing library Unit tests check whether code returns correct results In MATLAB: runtests

… almost all code has bugs, but not all bugs cause errors

Page 38: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

5. Plan for mistakes

Add assertions to programs to check their operation Assertions are internal self-checks that verify whether inputs, outputs, etc. are line with the programmer’s expectations In MATLAB: assert

Use an off-the-shelf unit testing library Unit tests check whether code returns correct results In MATLAB: runtests

Turn bugs into test cases To ensure that they will not happen again

… almost all code has bugs, but not all bugs cause errors

Page 39: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

5. Plan for mistakes

Add assertions to programs to check their operation Assertions are internal self-checks that verify whether inputs, outputs, etc. are line with the programmer’s expectations In MATLAB: assert

Use an off-the-shelf unit testing library Unit tests check whether code returns correct results In MATLAB: runtests

Turn bugs into test cases To ensure that they will not happen again

Use a symbolic debugger Symbolic debuggers make it easy to improve code Can detect bugs and sub-optimal code, run code step-by-step, track values of variables

… almost all code has bugs, but not all bugs cause errors

Page 40: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Page 41: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Page 42: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Page 43: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

6. Optimize software only after it works correctly

Use a profiler to identify bottlenecks Useful if your code takes a long time to run In MATLAB: profile

… and perhaps only if it is worth the effort

Page 44: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

6. Optimize software only after it works correctly

Use a profiler to identify bottlenecks Useful if your code takes a long time to run In MATLAB: profile

Write code in the highest-level language possible Programmers produce roughly same number of lines of code per day, regardless of language High-level languages achieve complex computations in few lines of code

… and perhaps only if it is worth the effort

Page 45: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

7. Document design and purpose, not mechanics

Document interfaces and reasons, not implementations Write structured and informative function headers Write a README file for each set of functions

… good documentation helps people understand code

see: http://www.mathworks.com/matlabcentral/fileexchange/4908-m-file-header-template/content/template_header.m

Page 46: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

7. Document design and purpose, not mechanics

Document interfaces and reasons, not implementations Write structured and informative function headers Write a README file for each set of functions

Refactor code in preference to explaining how it works Restructuring provides clarity, reducing space/time needed for explanation

… good documentation helps people understand code

Page 47: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

7. Document design and purpose, not mechanics

Document interfaces and reasons, not implementations Write structured and informative function headers Write a README file for each set of functions

Refactor code in preference to explaining how it works Restructuring provides clarity, reducing space/time needed for explanation

Embed the documentation for a piece of software in that software Embed in function headers, README files, but also interactive notebooks (e.g. knitr, IPython/Jupyter) Increases likelihood that code changes will be reflected in the documentation

… good documentation helps people understand code

Page 48: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

8. Collaborate

Use (pre-merge) code reviews Like proof-reading manuscripts, but then for code Aims to identify bugs, improve clarity, but other benefits are spreading knowledge, learning to coded

Page 49: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

8. Collaborate

Use (pre-merge) code reviews Like proof-reading manuscripts, but then for code Aims to identify bugs, improve clarity, but other benefits are spreading knowledge, learning to coded

Use pair programming when bringing someone new up to speed and when tackling tricky problems Intimidating, but really helpful

Page 50: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

8. Collaborate

Use (pre-merge) code reviews Like proof-reading manuscripts, but then for code Aims to identify bugs, improve clarity, but other benefits are spreading knowledge, learning to coded

Use pair programming when bringing someone new up to speed and when tackling tricky problems Intimidating, but really helpful

Use an issue tracking tool Perhaps mainly useful for large, collaborative coding efforts, but many code sharing platforms (e.g. Github, Bitbucket) come with these tools

Page 51: Journal Club - Best Practices for Scientific Computing

Best Practices for Scientific Computing

Useful resources

Page 52: Journal Club - Best Practices for Scientific Computing

www.ru.nl/donderswww.bramzandbelt.com