21
On Reproducible AI: Towards Reproducible Research, Open Science, and Digital Scholarship in AI Publications Odd Erik Gundersen, Norwegian University of Science and Technology Yolanda Gil, University of Southern California David W. Aha, US Naval Research Laboratory Abstract: Background: Artificial intelligence, like any science, must rely on reproducible experiments to validate results. Objective: To give practical and pragmatic recommendations for how to document AI research so that results are reproducible. Method: Our analysis of the literature shows that AI publications currently fall short of providing enough documentation to facilitate reproducibility. Our suggested best practices are based on a framework for reproducibility and recommendations for best practices given by scientific organizations, scholars, and publishers. Results: We have made a reproducibility checklist based on our investigation and described how every item in the checklist can be documented by authors and examined by reviewers. Conclusion: We encourage authors and reviewers to use the suggested best practices and author checklist when considering submissions for AAAI publications and conferences. 1. Introduction Reproducibility is a cornerstone of the scientific method. The ability and effort required from other researchers to replicate experiments and explore variations depends heavily on the information provided when the original work was published. Reproducibility is challenging for many sciences, for example when the variability of physical samples and reagents can significantly affect the outcome (Begley and Ellis 2012; Lithgow et al. 2017). In computer science, a large portion of the experiments are fully conducted on computers, making the experiments more straightforward to document (Braun and Ong 2014). Most AI and machine learning research also fall under this category of computational experimentation. However, reproducibility in AI is not easily accomplished (Hunold and Träff 2013; Fokkens et al. 2013; Hunold 2015). This may be because AI research has its own unique reproducibility challenges. Ioannidis (2005) suggests that the use of analytical methods which are still a focus of active investigation is one reason it is comparatively difficult to ensure that computational research is reproducible. For example, Henderson et al. (2017) show that problems due to non-determinism in standard benchmark environments and variance intrinsic to AI methods require proper experimental techniques and reporting procedures. Acknowledging these difficulties, computational research should be documented properly so that the experiments and results are clearly described. The AI research community should strive to facilitate reproducible research, following sound scientific methods and proper documentation in publications. Concomitant with reproducibility is open science. This involves sharing data, software, and other science resources in public repositories using permissive licenses. Open science is increasingly associated with FAIR principles to ensure that science resources have the necessary metadata to make them findable, accessible, AI Magazine, Vol. 39, No. 3, Fall 2018. http://doi.org/10.1609/aimag.v39i3.2816

On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

OnReproducibleAI:TowardsReproducibleResearch,OpenScience,andDigitalScholarshipinAIPublications

OddErikGundersen,NorwegianUniversityofScienceandTechnologyYolandaGil,UniversityofSouthernCaliforniaDavidW.Aha,USNavalResearchLaboratory

Abstract:Background:Artificialintelligence,likeanyscience,mustrelyonreproducibleexperimentstovalidateresults.Objective:TogivepracticalandpragmaticrecommendationsforhowtodocumentAIresearchsothatresultsarereproducible.Method:OuranalysisoftheliteratureshowsthatAIpublicationscurrentlyfallshortofprovidingenoughdocumentationtofacilitatereproducibility.Oursuggestedbestpracticesarebasedonaframeworkforreproducibilityandrecommendationsforbestpracticesgivenbyscientificorganizations,scholars,andpublishers.Results:Wehavemadeareproducibilitychecklistbasedonourinvestigationanddescribedhoweveryiteminthechecklistcanbedocumentedbyauthorsandexaminedbyreviewers.Conclusion:WeencourageauthorsandreviewerstousethesuggestedbestpracticesandauthorchecklistwhenconsideringsubmissionsforAAAIpublicationsandconferences.

1.Introduction

Reproducibilityisacornerstoneofthescientificmethod.Theabilityandeffortrequiredfromotherresearcherstoreplicateexperimentsandexplorevariationsdependsheavilyontheinformationprovidedwhentheoriginalworkwaspublished.Reproducibilityischallengingformanysciences,forexamplewhenthevariabilityofphysicalsamplesandreagentscansignificantlyaffecttheoutcome(BegleyandEllis2012;Lithgowetal.2017).Incomputerscience,alargeportionoftheexperimentsarefullyconductedoncomputers,makingtheexperimentsmorestraightforwardtodocument(BraunandOng2014).MostAIandmachinelearningresearchalsofallunderthiscategoryofcomputationalexperimentation.However,reproducibilityinAIisnoteasilyaccomplished(HunoldandTräff2013;Fokkensetal.2013;Hunold2015).ThismaybebecauseAIresearchhasitsownuniquereproducibilitychallenges.Ioannidis(2005)suggeststhattheuseofanalyticalmethodswhicharestillafocusofactiveinvestigationisonereasonitiscomparativelydifficulttoensurethatcomputationalresearchisreproducible.Forexample,Hendersonetal.(2017)showthatproblemsduetonon-determinisminstandardbenchmarkenvironmentsandvarianceintrinsictoAImethodsrequireproperexperimentaltechniquesandreportingprocedures.Acknowledgingthesedifficulties,computationalresearchshouldbedocumentedproperlysothattheexperimentsandresultsareclearlydescribed.

TheAIresearchcommunityshouldstrivetofacilitatereproducibleresearch,followingsoundscientificmethodsandproperdocumentationinpublications.Concomitantwithreproducibilityisopenscience.Thisinvolvessharingdata,software,andotherscienceresourcesinpublicrepositoriesusingpermissivelicenses.OpenscienceisincreasinglyassociatedwithFAIRprinciplestoensurethatscienceresourceshavethenecessarymetadatatomakethemfindable,accessible,

AI Magazine, Vol. 39, No. 3, Fall 2018. http://doi.org/10.1609/aimag.v39i3.2816

Page 2: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

interoperable,andreusable(Wilkinsonetal.2016).Moderndigitalscholarshippromotespropercredittoscientistswhodocumentandsharetheirresearchproductsthroughcitationsofdatasets,software,andinnovativecontributionstothescientificenterprise.

ThefocusinthisarticleisonbestpracticesfordocumentationanddisseminationofAIresearchtofacilitatereproducibility,supportopenscience,andembracedigitalscholarship.WebeginwithananalysisofrecentAIpublicationsthathighlightsthelimitationsoftheirdocumentationinsupportofreproducibility.

2.StateoftheArt:HowAIResearchisCurrentlyDocumented

GundersenandKjensmo(2018)analyzedhowwellempiricalAIresearchisdocumentedtofacilitatereproducibility.EmpiricalAIresearchinvolvesevaluatinghowwellcomputationalAImethodssolveaproblem.AnAImethodreferstoanabstractmethodforsolvingsuchproblems.Examplesincludeagentsystemsthatperformcollaborativetasksandneuralnetworkarchitecturestrainedusingbackpropagation.

Table1:Descriptionofallvariablesandtheirfactors.

Factor Variable Description

Method

Problem Isthereanexplicitmentionoftheproblemtheresearchseekstosolve?

Objective Istheresearchobjectiveexplicitlymentioned? Researchmethod Isthereanexplicitmentionoftheresearchmethodused

(empirical,theoretical)?Researchquestions

Isthereanexplicitmentionoftheresearchquestion(s)addressed?

Pseudocode IstheAImethoddescribedusingpseudocode?

Data

Trainingdata Isthetrainingsetshared? Validationdata Isthevalidationsetshared?Testdata Isthetestsetshared?Results Aretherelevantintermediateandfinalresultsoutputbythe

AIprogramshared?

Experiment

Hypothesis Isthereanexplicitmentionofthehypothesesbeinginvestigated?

Prediction Isthereanexplicitmentionofpredictionsrelatedtothehypotheses?

Methodsourcecode

IstheAIsystemcodeavailableopensource?

Hardware Isthehardwareusedforconductingtheexperimentspecified?

Softwaredependencies

Aresoftwaredependenciesspecified?

Experimentsetup Arethevariablesettingsshared,suchashyperparameters? Experimentsourcecode

Istheexperimentcodeavailableopensource?

Page 3: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

Figure1:Percentageofpapersdocumentingeachvariableforthethreefactors:Method(left),Data(middle),andExperiment(right)(GundersenandKjensmo2018).

Figure2:ThenumberofvariablesforthethreefactorsasdocumentedforallthepapersdescribingempiricalresearchinthestudybyGundersenandKjensmo(2018):Method(left),Data(middle),andExperiment(right).

Figure3:ChangeovertimeofthethreereproducibilitymetricsforthetwoconferencesAAAIandIJCAI(GundersenandKjensmo2018).

Page 4: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

TheanalysisbyGundersenandKjensmo(2018)isbasedonaliteraturereviewandaframeworkforreproducibility.Theirframeworkdividesdocumentationintothreefactors:(1)Method,whichspecifiestheAImethodunderinvestigationandtheproblemtobesolved;(2)Data,whichdescribesthedatausedforconductingtheempiricalresearch;and(3)Experiment,whichdocumentshowtheexperimentwasconducted.Howwellthesethreefactorsaredocumentedisindicatedby16yes/novariables(seeTable1)thataredirectlyrelevantforfacilitatingreproducibility.

Apublicationthatdocumentsanempiricalresearchstudycanbescoredusingthesevariables.Threereproducibilitymetricsareproposed.Thethreemetricsare:(1)R1D,whichcalculatestheaverageofallvariablesforallthreefactors(Method,Data,andExperiment),(2)R2D,whichcomputestheaverageofthevariablesfortheMethodandDatafactors,and(3)R3D,whichcalculatestheaverageofallvariablesfortheMethodfactor.Thesethreemetricsprovideanindicationofhowwellthescoredpapersdocumenttheresearchforthreedifferentdegreesofreproducibility(weprovidemoredetailonthesedegreesofreproducibilityinSection3).

Intotal,GundersenandKjensmosampled400papersfromtheAAAI2014,AAAI2016,IJCAI2013andIJCAI2016conferences.Amongthese,325papersdescribeempiricalstudies,whiletheremaining75papersdonot.Figure1displaysthepercentageofthesurveyedpapersthatdocumentedthedifferentvariableswhileFigure2summarizeshowmanyofthevariablesweredocumentedforeachfactorperpaper.

Wemakeafewobservations.AsseeninFigure1,fewofthepapersexplicitlymentiontheresearchmethodthatisused,andonlyaroundhalfexplicitlymentionwhichproblemisbeingsolved.Onlyaboutathirdofthepaperssharethetestdatasetandonly4%sharetheresultproducedbytheAIprogram.Only8%ofthepaperssharethesourcecodeoftheAImethodthatisbeinginvestigatedwhileonly5%explicitlyspecifythehypothesisand1%specifytheirprediction.Figure2showsthat:67papersdonotexplicitlydocumentanyofthevariablesforthefactorMethod;onlyonepaperdocumentsandsharestraining,validationandtestsetsaswellastheresults;andapproximately90%ofthepapersdocumentnomorethanthreeofthesevenvariablesofthefactorExperiment.

AsseeninFigure3,thetrendsareunclear.Statisticalanalysisshowedthatonlytwoofthemetrics,R1DandR2D,forIJCAIhadastatisticallysignificantincreaseovertime.WhileR2DandR3DforAAAIdecreaseovertime,thedecreaseisnotstatisticallysignificant.

ThestudybyGundersenandKjensmo(2018)hassomelimitations.Forexample,thestudyrequiredthatforthevariableproblemtobesettoyes(true),thepapermustexplicitlystatetheproblemthatisbeingsolved.AnothershortcomingisthatalltheAImethodsthataredocumentedintheresearchpapersarenotnecessarilydescribedbetterwithpseudocodethanwithout,butthisfactwasnotgivenanyconsideration.IfapaperdescribedanAImethodandpseudocodewasnotprovided,

Page 5: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

thepseudocodevariablewassettofalseforthatpaper.Finally,someofthevariablesmightberedundant(e.g.,problem,goal,orresearchquestions).Still,despitetheseshortcomings,thefindingsindicatethatcomputationalAIresearchisnotdocumentedsystematicallyandwithenoughinformationtosupportreproducibility.

3.DegreesofReproducibility

GundersenandKjensmo(2018)distinguishbetweenthreedegreesofreproducibility,whicharedefinedasfollows:

R1:ExperimentReproducibleTheresultsofanexperimentareexperimentreproduciblewhentheexecutionofthesameimplementationofanAImethodproducesthesameresultswhenexecutedonthesamedata.Thisisoftencalledrepeatability.R2:DataReproducibleTheresultsofanexperimentaredatareproduciblewhenanexperimentisconductedthatexecutesanalternativeimplementationoftheAImethodthatproducesthesameresultswhenexecutedonthesamedata.Thisisoftencalledreplicability.R3:MethodReproducibleTheresultsofanexperimentaremethodreproduciblewhentheexecutionofanalternativeimplementationoftheAImethodproducesconsistentresultswhenexecutedondifferentdata.Thisisoftencalledreproducibility.

EmpiricalresearchthatisR1(ExperimentReproducible)mustdocumenttheAImethod,thedatausedtoconducttheexperiment,andtheexperimentitselfincludingthesourcecodefortheAImethodandtheexperimentsetup,whileR2(DataReproducible)researchmustonlydocumenttheAImethodandthedata.R3(MethodReproducible)researchmustonlydocumenttheAImethod.Figure4illustratesthedifferentfactorsthatmustbedocumentedforthethreereproducibilitydegrees.

Ifanindependentteamreproducesresearchandgetsresultsthatareconsistentwiththeoriginalresults,thegeneralityoftheresultsdependsonthelevelofdocumentationthatwasprovidedtotheindependentteam.IftheoriginalresearchwasR1(ExperimentReproducible),theindependentteamhasconfirmedthatthespecificimplementationoftheAImethodprovidedbytheoriginalresearchteamachievedthereportedresultsonthespecificdatathatalsowasprovidedbytheoriginalresearchteam.Hence,thegeneralityoftheresultsislimitedtothatspecificimplementationandthatspecificdata.However,iftheindependentteamreproducestheresultsofsomeresearchthatwasR3(MethodReproducible)andgetsconsistentresults,theresultsaremoregeneral,astheyapplytoare-implementationandotherdata.Thisleadstodifferentincentivesfortheresearcherswhoconductedtheinitialempiricalstudyandtheindependentresearcherswhoattempttoreproducetheresults.

Page 6: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

Figure4:Thethreedegreesofreproducibilityaredefinedbywhichdocumentationisusedtoreproducetheresults.Thethreedegreesofreproducibilityeachrequireadifferentsetoffactorstobedocumented.

Figure5:Effectsofdocumentationasseenfromtheperspectiveofindependentresearchers.

Figure6:Effectsofdocumentationasseenfromtheperspectiveoftheoriginalresearchers.

Page 7: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

Independentresearcherstrustanempiricalstudy’sresultsincreasinglywiththeamountofdocumentationthatissharedwiththem,whiletheefforttoreproducetheresultsincreaseswhentheamountofdocumentationisreduced.ThissituationisillustratedinFigure5.Hence,independentresearcherswouldpreferR1(ExperimentReproducible)research.

Ontheotherhand,theefforttodocumenttheresearchincreasesfortheoriginalresearcherswiththeamountofdocumentationthatneedstobeshared,whilethegeneralityofthemethodisincreasedifindependentresearchersreproducetheresultsgivenlessdocumentation.Hence,theoriginalresearchersmayprefertodocumenttheirresearchtobeR3(MethodReproducible).

Combinethisconflictofincentivesfortheoriginalandindependentresearcherswiththepressuretopublishanditiseasytoseehowthiscanleadtoresearchbeingdocumentedlessvigorously.However,byfollowingtherecommendationsgivenherethetrustworthinessandreproducibilityofresearchresultscanbeincreasedwithlittleeffortrequiredfromauthors.Still,changescannotbeexpectedsolelyfromindividualresearchersalone.Theresearchcommunity,fundingsponsors,employersofresearchers,andpublishersshould,intheirrespectiveroles,withincentivizeandrewardreproducibleresearch.

4.BestPracticesandRecommendationsTherecommendationsweintroduceinSections4.1-4.4arebasedonbestpracticesputforwardbyscientificorganizations(RDA2015;CODATA2013;DataCite2015;FORCE112014;ESIP2012),scholars(Wilkinsonetal.2016;Stoddenetal.2016;Giletal.2016;Noseketal.2015;Starretal.2015;Uhliretal.2012;Downsetal.2015;BallandDuke2012;MooneyandNewton2012;Goodmanetal.2014;Garijoetal.2013;AltmanandKing2007),andpublishers(Hansonetal.2015;COPDESS2015).StrongmomentumisbuildinginsupportofFAIRpractices,i.e.,tomakedataFindable,Accessible,Interoperable,andReusableWilkinsonetal.2016).OurrecommendationssupportFAIRprinciplesandextendthemtopromotereproducibleresearch,openscience,anddigitalscholarship.Implementingtheserecommendationsrequiresextraspaceinpublications.Wesuggestincludingthisadditionalcontentinappendicesthattechnicalreviewerswillnotberequiredtoassessbutcanquicklycheck.Forelectronicpublications,thereshouldnotbeanyspacelimitationsimposedforsuchappendices.Whentheserecommendationscannotbemet,abriefexplanationshouldbeincludedaboutthereasons.Possiblereasonsmayberestrictedaccess(e.g.,proprietaryorsensitivedata),ownershipbyclosecollaboratorswhodonotwishtodisclosecertaindetails,inadequateresources(e.g.,tohouselargedatasets),oranunreasonableburdenonauthors.

Page 8: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

Webeginwithrecommendationsfordata(Section4.1)andsourcecode(Section4.2)asthebasicingredientsofacomputationalexperiment.ThenwedescriberecommendationstodocumentAImethods(Section4.3)andtheexperimentsthemselves(Section4.4).IfallrecommendationsforAImethods(Table4)areimplemented,thenthepublicationshouldintheorybeR3(MethodReproducible),whileifallrecommendationsfordata(Table2)arealsoimplemented,thentheresearchshouldbeR2(DataReproducible).Finally,allfoursetsofrecommendations(Tables2-5)mustbeimplementedfortheresearchtobefullyR1(ExperimentReproducible).Wewillrefertothecompletesetof20recommendationsasanauthorchecklist,weprovideexamplestodemonstratethattheyaresynergistic,andwearguethattheycanbeeasilyimplemented.4.1RecommendationsforDataTable2summarizesourrecommendationsfordocumentingdata,whichconcern:(1)repositoryuse,(2)metadata,(3)licenses,(4)persistentuniqueidentifiers,and(5)citations.Theycanbeeasilyimplementedifresearchersusecommunitydatarepositoriesthatsupportrecommendedbestpractices.Table2.Authorchecklist(PartI):Recommendationsfordatainpublications.RECOMMENDATIONS1-5:Datamentionedinapublicationshould:

1. Beavailableinasharedcommunityrepository,soanyonecanaccessit2. Includebasicmetadata,sootherscansearchandunderstanditscontents3. Havealicense,soanyonecanunderstandtheconditionsforreuseofthedata4. Haveanassociateddigitalobjectidentifier(DOI)orpersistentURL(PURL)so

thatthedataisavailablepermanently5. Becitedproperlyintheproseandlistedaccuratelyamongthereferences,so

readerscanidentifythedatasetsunequivocallyanddatacreatorscanreceivecreditfortheirwork

DataRepositoriesData repositoriesexistformanydomains,andassuchtheyareavailabletotheAIcommunity.ExamplesofthesegeneralrepositoriesareZenodo(Zenodo2018),figshare(figshare2018),andDataverse(Dataverse2018).TheserepositorieswillautomaticallyassignaDOItoanyuploadeddataandwillalsoacceptsoftware,figures,movies,andslidepresentations.Theywillalsoinquireaboutchoosingalicense,andwithspecifyingadescriptivenameandauthorsforasubmitteddataset.AAAIcould,asaservice,providealistofrecommendeddatarepositories.ThiscouldbemodelledonaserviceprovidedbyCOPDESS,whichisalargecoalitionforpublishingdataintheearthandspacesciences(COPDESS2015).Universitiesalsooffergeneralrepositories,whetherdevelopedinhouseorasinstallationsofgeneral

Page 9: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

infrastructuresuchasDataverse.Universityrepositoriesaretypicallymaintainedbylibrarydepartments,andalwaysofferDOIs,licenses,andcitations.WeencouragemaintainersofdatarepositoriesthatservetheAIcommunitytoadoptmechanismsforassigningDOIsorpersistentURLs(PURLs)todatasetsthattheyprovide.ThemanagementofPURLsorDOIscanbecomplex.WesuggestconsultingwithorganizationssuchasFORCE11andtheResearchDataAlliance,whichhaveworkinggroupswithextensiveanddetailedrecommendationsonthistopic.MetadataBasicmetadataincludesadescriptivetitle,thedataset’sauthors,andcreationdate.Additionalmetadataisalwaysvaluabletoothersintermsofunderstandingandreusingthedataset.LicensesforDataRecommendedlicensesfordataareCreativeCommonslicenses(Creative2018),preferablyCC-BY(unlimitedreuseaslongasthereisattribution)orCC0(unlimitedreusewithoutconditions).PermanentUniqueIdentifiersforDataManyauthorsmakedataavailablebyprovidingaURLtotheirpersonalorlabpages.Thesereferencesmaynotlastlongduetochangesinsitesandinauthoraffiliations(Kleinetal.2014).Instead,weencourageauthorstousepersistentuniqueidentifierssothattheirdataisalwaysavailable.DOIsaremanagedbydatarepositoriesandgiventoindividualdatasetsortocollections(DeRisietal.2013).MostdatarepositoriesprovideDOIs,andforthistheyforgeanagreementwithaDOIauthority.AnotheroptionthatanyonecanuseisPURLS.PURLscanbeassignedbyanyonetoanywebresourceusingatrustedservicesuchastheW3C’sw3id(w3id2018).DatarepositoriesalsohavetheoptionofusingPURLs.DataCitationAdatacitationcanbedirectlyprovidedbyadatarepository,oritcanbeconstructedbyhand.Acitationforadatasetconsistsofadescriptivename(ortitle)forthedataset,itscreators,thenameoftherepositorywhereitcanbeaccessed,andthepermanentURL.Forexample,acitationforadatasetin(Giletal.2017)is:Adusumilli,Ravali.(2016).Sampledatasetsusedin(Giletal.2017)forAAAI2017(Dataset).Zenodo.http://doi.org/10.5281/zenodo.180716.

NotethatbysimplyuploadingthedatasettotheZenodorepositoryweobtainedtheDOIandthecitation.Specifyingtheauthors,thename,andthelicensetakenegligibleeffort.Theauthorchecklistfordatarequiredlittletimetoimplement.4.2RecommendationsforSourceCode

Page 10: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

Werefertosourcecodeasthehumanreadablecomputerinstructionswritteninplaintextandsoftwareascomputerprogramsthatareexecutablebyacomputer.Typically,sourcecodeiscompiledtosoftwareforacomputertorunit.OurrecommendationsforsourcecodearesummarizedinTable3.Table3.Authorchecklist(PartII):RecommendationsforsourcecodeimplementingAImethodsandexperimentsinpublications.RECOMMENDATIONS6-10:SourcecodeusedforimplementinganAImethodandexecutinganexperimentshould:

6. Beavailableinasharedcommunityrepository,soanyonecanaccessit7. Includebasicmetadata,sootherscansearchandunderstanditscontents8. Includealicense,soanyonecanunderstandtheconditionsforuseand

extensionofthesoftware9. Haveanassociateddigitalobjectidentifier(DOI)orpersistentURL(PURL)for

theversionusedintheassociatedpublicationsothatthesourcecodeispermanentlyavailable

10. Becitedandreferencedproperlyinthepublicationsothatreaderscanidentifytheversionunequivocallyanditscreatorscanreceivecreditfortheirwork

SourceCodeRepositoriesSourcecoderepositoriescanbeusedbyanyscientiststosharecode,andassuchtheyareavailabletotheAIcommunity.TheseincludegeneralrepositoriessuchasGitHubandBitBucket,andlanguage-specificrepositoriessuchasCRANforRcodeorFileExchangeinMATLABCentral.Generaldatarepositoriessuchasthosementionedaboveacceptsourcecodeasanentry,andaswithanydatasettheyalwaysofferDOIs,licenses,andcitations.Foraspecificpublication,theversionofthesourcecodethatisbeingusedshouldbeclearlyspecified,andthesourcecoderepositoryshouldsupporttheidentificationandfutureaccessofspecificversions.SourceCodeMetadataBasicmetadataincludesadescriptivetitle,thesourcecode’sauthors,andthecreationdate.Additionalmetadataisalwaysvaluabletoothersintermsofunderstandingandreusingthesourcecode.LicensesforSourceCodeRecommendedlicensesforsourcecodearethestandardlicensesfromtheOpenSourceInitiative.LicensessuchasApachev2orMITarerecommendedbecausetheyprovideunlimitedreuse(aslongasthereisattribution).Othermorerestrictivelicensesareavailabletolimitcommercialuseorimposelicensingconditionsonextensionsoftheoriginalsourcecode.

Page 11: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

PermanentUniqueIdentifiersforSourceCodeAseparateDOIshouldbeassignedtomeaningfulversionsofthesourcecode,suchasaversionusedforapublication.GitHuboffersanoptiontoobtainaDOIforasourcecodeversion,whichisdonebystoringthatversionpermanentlyintheZenododatarepository.AnysourcecodecanbeuploadedmanuallytocommunitydatarepositoriessuchasZenodo,figshare,andDataverse.PURLScanbeassignedbyanyonetoanysourcecodeversionthathasaURLontheWeb,usingatrustedservicesuchasw3id.org.SourceCodeCitationSourcecodecitationcanbedirectlyprovidedbyasourcecoderepository,oritcanbeconstructedbyhand.Acitationforasourcecodeversionconsistsofadescriptivename(ortitle)forthesourcecode,itscreators,thenameoftherepositorywhereitcanbeaccessed,theversion,andthepermanentURL.Forexample,acitationforGitHubcodein(Giletal.2017)is:Ratnakar,Varun.“DISKsoftware”(v1.0.0).Zenodo.2016.http://doi.org/10.5281/zenodo.168079ByuploadingthesourcecodeintotheGitHubcoderepository,weobtainedapersistentidentifierfortheversionusedinthepublicationaswellasthecitation.Specifyingtheauthors,thename,andthelicensetakenegligibleeffort.Implementingtheauthorchecklistforsourcecoderequiredlittletime.4.3RecommendationsforAIMethodsOurrecommendationsforAImethodsarelistedinTable4.Table4.Authorchecklist(PartIII):RecommendationsforAImethodsinpublications.RECOMMENDATIONS(11-13):AImethodsusedinapublicationshouldbe:

11. Presentedinthecontextofaproblemdescriptionthatclearlyidentifieswhatproblemtheyareintendedtosolve

12. Outlinedconceptuallysothatanyonecanunderstandtheirfoundationalconcepts

13. Describedinpseudocodesothatotherscanunderstandthedetailsofhowtheywork

ProblemDescriptionTheproblemthataconceptualAImethodsolvesshouldbeexplicitlydescribedinthepublication.In(DeWeerdtetal.2013)thefollowingexamplecanbefound:”Toaddressthisproblem,weproposeanovelnavigationsystem...”.Theauthors

Page 12: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

explicitlydescribetheproblemthattheyaddress.Anothergoodexampleofthispracticecanbefoundin(Heetal.2016).Heretheauthorsstatetheproblemexplicitly:”Inthispaper,weaddressthedegradationproblembyintroducingadeepresiduallearningframework.”Thedegradationproblemisalsoproperlydescribedintheirpublication.ConceptualMethodAhigh-level,textualdescriptionoftheAImethodshouldbeprovidedtoreaderstoallowthemtogainanunderstandingofit.ThisdescriptionshouldincludeabroadoverviewofhowtheAImethodworksandspecifyinputvariablesandtheresultingoutput.Ingeneral,theAIresearchcommunityexcelsatprovidingthisinformationinpublications.PseudocodePseudocodefortheAImethodshouldalsobeprovided.Incaseswheredetailedpseudocodecannotbeprovidedduetothecomplexityoftheproposedalgorithmorsystem,amoreabstractpseudocodesummarycanbeprovidedthatillustratestheAImethod’sflow.Bothahigh-leveldescriptionandpseudocodehelpindependentresearcherstodecidewhethertheirownimplementationofthemethodiscorrect.Ifthesearenotpresentedcarefully,thentheempiricalstudycannotalwaysbeeasilyreproduced.4.4RecommendationsforExperimentsAuthorsshould,tothedegreepossible,detailhowtheirexperimentsaredesigned,andindicatetherationalefortheirdesign.OurrecommendationsfordocumentingexperimentsaresummarizedinTable5.Table5.Authorchecklist(PartIV):Recommendationsforexperimentsdescribedinpublications.RECOMMENDATIONS(14-23):Descriptionsofexperimentsinapublicationshould:

14. Explicitlypresentthehypothesestobeassessed,beforeotherdetailsconcerningtheempiricalstudyarepresented

15. Presentthepredictedoutcomeoftheexperiment,basedonbeliefsabouttheAImethodanditsapplication

16. Includetheexperimentalsetup(parametersandtheconditionstobetested)anditsmotivation,suchaswhyaspecificnumberoftestsordatapointsareusedbasedonthedesiredstatisticalsignificanceofresultsandtheavailabilityofdata

17. Presenttheresults(i.e.,measuresandmetrics)andtheanalysis18. Anexplicitindicationofwhethertheanalysessupportthehypotheses

Page 13: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

19. Justifywhythedatasetsusedareappropriatefortheexperiment,whythechosenempiricaldesignisappropriateforassessingthehypothesis,andwhythemetricsandmeasuresareappropriateforassessingtheresults

20. Bedescribedasaworkflowthatsummarizeshowtheexperimentisexecutedandconfigured

21. Includedocumentationonworkflowexecutionsorexecutiontracesthatprovideparametersettingsandinitial,intermediate,andfinaldata

22. Specifythehardwareusedtoruntheexperiments23. Becitedandpublishedseparatelywhencomplex,sothatotherscan

unequivocallyrefertotheindividualportionsofthemethodthattheyreuseorextend

HypothesesandpredictionsHypothesesandpredictionsshouldbestatedexplicitlybeforedescribingtheothercomponentsofanempiricalstudytoensurethattheresultsanalysisismeaningful(Baker2016).

EmpiricaldesignAtextualdescriptionandjustificationoftheexperiment’sdesignshouldbeprovided,toincludeadescriptionofeachtestcondition.Thisshouldalsodescribe,forexample,whyaspecificnumberoftestsordatapointsareusedbasedonthedesiredstatisticalsignificanceoftheresultsandtheavailabilityofdata.DatasetsResearchersshouldjustifytheuseoftheirselecteddatasets.EvaluationprotocolAjustificationshouldbeprovidedforthechosenprotocolwhendocumentinganexperiment.Toavoidhypothesismyopia,thisshouldnotbedesignedtocollectonlyevidencethatisguaranteedtosupportthestatedhypotheses.Instead,toencourageaninsightfulstudy,thisshouldincludeconditionsthecouldleadtotherejectionofthesehypotheses.Themeasuresandmetricstobeusedtoevaluatetheresearchmustbedescribed,andsoshouldtheanalysisprocedure(s)(e.g.,forassessingstatisticalsignificance)beaswell.ResultsandanalysisTheresultsandtheanalysisshouldbepresentedindetail.WorkflowThisworkflowshoulddescribe,inamachine-readableway,howsoftwareanddataareusedtoimplementtheevaluationprotocol.Aworkflowstepisaninvocationofthesoftware.Eachstephasinputdataandparametersaswellasoutputdata.Inputdataandtheoutputofanystepcanbeusedasinputtosubsequentsteps.Thesimplestworkflowlanguagescapturemethodsthataredirectedacyclicgraphs,

Page 14: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

whileotherlanguagescanrepresentiterationsandconditionals.Apublicationthatsimplymentionswhatsoftwarewasusedusuallyleavesoutcriticalinformationabouthowthesoftwarewasconfiguredorinvoked.Scriptsorelectronicnotebookscanbeaneffectivewaytodocumentworkflows,althoughtheorganizationofsourcecodeismoremodularinaworkflowstructure.ExecutionsAgeneralworkflowcanberunmanytimeswithdifferentdatasetsorparametersettingsandgeneratedifferentresults.Executiontracesofexecutedworkflowsprovideacompleteprovenancetrailofhoweachresultwasgenerated.HardwarespecificationThehardwarethatisusedshouldbespecifiedifthisisimportantfortheexperiment.Thismayincludespecificationoftheprocessortype,thenumberofcoresandprocessors,RAMandharddiskrequirements.Also,theproviderofthecloudsolutionthatisused,ifany,shouldbespecified.Themachinearchitectureandoperatingsystemmayneedtobespecified,sothatanydiscrepanciesinresultscanbeproperlydiagnosed.Librarydependenciesshouldalsobedescribed.Virtualizationtechnologies,suchasdockerandKubernetes,facilitatethesespecificationsthroughartifactscalledcontainers.Containerscanbeprovidedasappropriatetosharetheexperimenthardwaresetup.WorkflowcitationCitingapublicationdoesnotmakeexplicitwhetherthecitationistoitsAImethod,sourcecode,data,empiricaldesign,workflows,executiontraces,results,orageneralbodyofworkorcontributions.Ifitisimportantthatothersareexplicitaboutwhataspectsoftheworkarebeingreused,thenseparatecitationsshouldbegiventoeach,asappropriate.Althoughworkflowrepositoriesarenotascommonasdataandsoftwarerepositories,manygeneraldatarepositoriesacceptanyresearchproductandcanbeusedforthispurpose.Forexample,acitationforabundlecontainingworkflowsandexecutiondetailsfor(Giletal.2017)is:

Adusumilli,Ravali,Ratnakar,Varun,Garijo,Daniel,Gil,Yolanda,andMallick,Parag.(2016).Additionalmaterialsusedinthepaper"TowardsContinuousScientificDataAnalysisandHypothesisEvolution"ontheProceedingsoftheThirty-FirstAAAIConferenceonArtificialIntelligence(AAAI-17)(Dataset).Zenodo.http://doi.org/10.5281/zenodo.190374

Byorganizingtheworkflowsandexecutionsdescribedintothispublicationandbundlingthemtouploadtoageneraldatarepository,theseauthorsobtainedapersistentidentifieraswellasacitation.Theauthorchecklistforexperimentswasimplementedquickly.

Page 15: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

5.BenefitstoAuthorsRecognizingthatourrecommendationswillrequireeffortfromauthors,wewanttohighlightthefollowingbenefits:

1. Practiceopenscienceandreproducibleresearch.Thisensuresthekindsofchecksandbalancesthatleadtobetterscience.

2. Receivecreditforallyourresearchproducts(i.e.,throughcitationsforsoftware,datasets,andotherproducts).

3. Increasethenumberofcitationstoyourpublications.Studieshaveshownthatwell-documentedarticlesreceivemorecitations(Piwowaretal.2007).

4. Improveyourchancesofbeingfunded(i.e.,bywritingcoherentandwell-motivatedempiricalstudyanddatamanagementplans).

5. ExtendyourCV.Includedataandsoftwaresectionswithcitations.MaintainingdatasetsandwritingcodeareimportantcontributionstothefieldofAI.

6. Improvethemanagementofyourresearchassets(e.g.,soyournewstudents,andothers,canmoreeasilylocatematerialsgeneratedbyyourearlierstudents).

7. Allowforthereproductionofyourwork(e.g.,soyouandotherscanleverageitinnewstudies,evenifitwasconductedmanyyearsago).

8. Addressnewsponsorandjournalrequirements.Theyaresteadfastlydrivingresearchtowardsincreasedreproducibilityandopenscience.

9. Attracttransformativestudents.Theystriveforarigorousresearchmethodology.

10. Demonstrateleadership.Stepintothefuture.Byexplicitlycitingdatasetsandsourcecode,andbyprovidingworkflowsthataremachinereadable,wecreatethestructureneededthatcanallowforthedevelopmentofAIsystemsthatcananalyzeandreasonaboutourliterature(Gil2017).TheseAIsystemswouldhaveaccesstoavastamountofstructuredscientificknowledgewithcomprehensivedetailsaboutexperimentaldesignandresults.Thiscouldrevolutionizehowweapproachthescientificresearchprocess.6.DiscussionItisreasonabletoexpectalimitedreleaseofdataandsourcecodeuntilthecreatorhascompletedtheresearchforwhichthedatawascollected,orforwhichthesourcecodewaswritten,oruntiltheirdraftispublished.Manyjournalsimposethis,suchasScienceandNature.See(Jolyetal.2012)forareviewofdataretentionpolicies.Thecreationanddocumentationofadditionalinformationwerecommendshouldbedonebyresearcherswhopublishtheirstudies.Bydocumentingand

Page 16: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

sharingcodeanddatainsuchawaythattheycanbeeasilyusedandcitedbyothersgivesresearcherscreditforalargerportionoftheirresearcheffort.Foracademicresearchers,weadvocatethattenurecommitteesgiveweighttothepublicationofdataandsourcecodewhenevaluatingcandidatesfortenure.Thus,thepublicationvelocityshouldnotbereduced,butincluderesearchproductsotherthanpublications.Therecommendationswesuggestshouldbeapartofdailyresearchpractices.AccordingtoIrakliLoladze,despiteincreasingworkloadby30%,“Reproducibilityislikebrushingyourteeth.Itisgoodforyou,butittakestimeandeffort.Onceyoulearnit,itbecomesahabit”(Baker2016).Anotherrecommendationforimprovingthereadabilityandcomparabilityofresearchpapersistorequirestructuredabstracts,whicharecommonlyusedinmedicaljournals.Structuredabstractscanbeusedtoefficientlycommunicatearesearchobjective,themotivationforandprocessbywhichanempiricalstudywasconducted,andwhatresultswereachieved.Structuredabstractsalsorequireresearcherstostructuretheirownthoughtsabouttheirresearch.Wesuggestafive-partstructuredabstractcontaining(1)theresearchmotivation,(2)theresearchobjective,(3)themethodusedtoconductanyempiricalstudies,(4)theresultsoftheresearch,and(5)theconclusion.Thisstructureenforcesacoherentresearchnarrative,whichisnotalwaysthecaseforunstructuredabstracts.Theabstractforthisarticleisanexampleoftheproposedstructure,while(GundersenandKjensmo2018)providesanabstractforempiricalresearchthatfollowstheserecommendationsandincludesanexplicitdescriptionofthehypothesisandaninterpretationoftheresults.

7.CalltoarmsAsacommunity,weshouldensurethattheresearchthatweconductisproperlydocumented.TomakeAIresearchreproducibleandmoretrustworthy,weproposedbestpracticesthatshouldbeadoptedbyeditorsandprogramchairsandincorporatedintothereviewformsofAAAIpublicationvenues.Publishersshouldprovideextraspacetodocumentandcitedata,sourcecode,andempiricalstudydesigns.AAAIleadershipshouldencourageAIresearcherstoincreasethereproducibilityoftheirpublishedwork.Thiscouldincludeprovidingstructuredtemplatestoorganizeappendicesandextraspaceinpublicationstoaccommodatetheneededdocumentation.ForAIresearchtobecomeopenandmorereproducible,theresearchcommunityandpublishershavetoestablishsuitablepractices.Authorsneedtoadoptthesepractices,disseminatethemtocolleaguesandstudents,andhelpdevelopmechanismsandtechnologytomakeiteasierforotherstoadoptthem.

Page 17: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

Ourobjectivewiththisarticleistohighlightthebenefitsofreproduciblescience,andproposeinitial,modestchangesthatcanincreasethereproducibilityofAIresearchresults.Therearemanyadditionalactionsthatcouldandshouldbetaken,andwelookforwardtofurtherdialoguewiththeAIresearchcommunityonhowtoincreasethereproducibilityandscientificvalueofAIpublications.ACKNOWLEDGEMENTSThisresearchwasfundedinpartbytheNationalScienceFoundationundergrantICER-1440323.ThisworkhasinpartbeencarriedoutattheTelenor-NTNUAILab,NorwegianUniversityofScienceandTechnology,Trondheim,Norway.TherecommendationsproposedarebasedontheGeosciencePaperoftheFutureandtheScientificPaperoftheFuturebestpracticesdevelopedunderthataward.ThankstoSigbjørnKjensmoforalltheeffortputintosurveyingthestateoftheartofreproducibilityofAI.REFERENCES(AltmanandKing2007)“Aproposedstandardforthescholarlycitationof

quantitativedata.”Altman,M.,andKing,G.D-LibMagazine,13(3/4).doi:10.1045/march2007-altman

(Baker,2016)"Isthereareproducibilitycrisis?."MonyaBaker.Nature,533.May2016.DOI:doi:10.1038/533452a(BallandDuke2012)“HowtoCiteDatasetsandLinktoPublications”.DCC

How-toGuides.Edinburgh:DigitalCurationCentre.Availableonline:http://www.dcc.ac.uk/resources/how-guides-Seemoreat:http://www.dcc.ac.uk/resources/how-guides/cite-datasets#sthash.MJQjNn3i.dpuf

(BegleyandEllis2012)"Drugdevelopment:Raisestandardsforpreclinical

cancerresearch."Begley,C.G.,andEllis,L.M.Nature531.March2012.DOI:10.1038/483531a

(BraunandOng2014)Braun,M.L.andOng,C.S.Openscienceinmachine

learning.InImplementingReproducibleResearch,page343.CRCPress.2014.

(CODATA2013)“OutofCite,OutofMind:TheCurrentStateofPractice,Policy,

andTechnologyfortheCitationofData.”CODATA-ICSTITaskGrouponData

Page 18: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

CitationStandardsandPractOutofCite,OutofMind:TheCurrentSices.DataScienceJournal,2013.DOI:10.2481/dsj.OSOM13-043

(COPDESS2015)“StatementofCommitmentfromEarthandSpaceScience

PublishersandDataFacilities.”CoalitiononPublishingDataintheEarthandSpaceSciences(COPDESS).January14,2015.http://www.copdess.org/statement-of-commitment/

(Creative2018)CreativeCommons.Availablefrom

https://creativecommons.org.Lastaccessed18May2018.(DataCite2015)DataCite.Availablefromhttps://www.datacite.org/.Last

accessed3August2015.(Dataverse2018)TheDataverseproject.Availablefromhttps://dataverse.org.

Lastaccessed18May2018.(DeWeerdtetal.2013)"Intention-awareroutingtominimisedelaysatelectric

vehiclechargingstations."DeWeerdt,M.M.,Gerding,E.H.,Stein,S.,Robu,V.,andJennings,N.R.InProceedingsoftheTwenty-ThirdinternationaljointconferenceonArtificialIntelligence,pages83–89.AAAIPress,2013

(DeRisietal2013)“TheWhatandWhysofDOIs.”SusanneDeRisi,Rebecca

Kennison,NickTwyman.PLoSBiology1(2):e57,2013.(Downsetal.2015)“DataStewardshipintheEarthSciences.”RobertR.Downs,

RuthDuerr,DeniseJ.Hills,andH.K.Ramapriyan.D-LibMagazine,21(7/8).doi:10.1045/july2015-downs

(ESIP2012)“InteragencyDataStewardship/Citations/providerguidelines.”

FederationofEarthScienceInformationPartners(ESIP),2January2012.Availablefromhttp://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelin

(figshare2018)figshare.Availablefromhttps://figshare.com.Lastaccessed18May

2018.(Fokkensetal.2013)"Offspringfromrepro-ductionproblems:What

replicationfailureteachesus".Fokkens,A.,ErpM.V.,Postma,M.,Pedersen,M.,Vossen,P.,andFreire,N.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputationalLinguistics,pages1691–1701.Associa-tionforComputationalLinguistics(ACL),2013.

Page 19: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

(FORCE112014)JointDeclarationofDataCitationPrinciples.MartoneM.(ed.)andtheDataCitationSynthesisGroup,SanDiegoCA:FORCE112014.Availablefromhttps://www.force11.org/datacitation.

(Garijoetal.2013)“QuantifyingReproducibilityinComputationalBiology:The

CaseoftheTuberculosisDrugome”DanielGarijo,SarahKinnings,LiXie,LeiXie,YinliangZhang,PhilipE.Bourne,andYolandaGil.PLOSONE,27November2013.

(Gil2017)“ThoughtfulArtificialIntelligence:ForgingANewPartnershipfor

DataScienceandScientificDiscovery.”YolandaGil.DataScience(1):1-2,2017.DOI:10.3233/DS-170011

(Giletal.2016)“TowardstheGeosciencePaperoftheFuture:BestPracticesfor

DocumentingandSharingResearchfromDatatoSoftwaretoProvenance.”Gil,Y.;David,C.H.;Demir,I.;Essawy,B.T.;Fulweiler,R.W.;Goodall,J.L.;Karlstrom,L.;Lee,H.;Mills,H.J.;Oh,J.;Pierce,S.A;Pope,A.;Tzeng,M.W.;Villamizar,S.R.;andYu,X.EarthandSpaceScience,3.2016.

(Giletal.2017)“TowardsContinuousScientificDataAnalysisandHypothesis

Evolution.”YolandaGil,DanielGarijo,VarunRatnakar,RajivMayani,RavaliAdusumilli,HunterBoyce,ArunimaSrivastava,andParagMallick.ProceedingsoftheThirty-FirstAAAIConferenceonArtificialIntelligence(AAAI-17),SanFrancisco,CA,2017.

(Goodmanetal.2014)Goodman,A.,Pepe,A.,Blocker,A.W.,Borgman,C.L.,

Cranmer,K.,Crosas,M.,Stefano,R.D.,Gil,Y.,Groth,P.,Hedstrom,M.,Hogg,D.W.,Kashyap,V.,Mahabal,A.,Siemiginowska,A.,andA.Slavkovic(2014),Tensimplerulesforthecareandfeedingofscientificdata,PLOSComputationalBiology,10,April24,2014,doi:10.1371/journal.pcbi.1003542.

(GundersenandKjensmo2018)“StateoftheArt:ReproducibilityinArtificial

Intelligence.”OddErikGundersenandSigbjørnKjensmo.ProceedingsoftheThirty-SecondAAAIConferenceonArtificialIntelligence(AAAI-18),NewOrleans,LA,2018.

(Hansonetal.2015)“CommittingtoPublishingDataintheEarthandSpace

Sciences.”BrooksHanson,KerstinLehnert,andJoelCutcher-Gershenfeld.EOS95,15January2015.doi:10.1029/2015EO022207.https://eos.org/agu-news/committing-publishing-data-earth-space-sciences

(Heetal.2016)"Deepresiduallearningforimagerecognition."He,K.,Zhang,X.,

Ren,S.,andSun,J.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages770–778,2016.

Page 20: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

(Hendersonetal.2017)Henderson,P.,Islam,R.,Bachman,P.,Pineau,J.,Precup,

D.,&Meger,D.(2017).Deepreinforcementlearningthatmatters.arXivpreprintarXiv:1709.06560.

(Hunold2015)"Asurveyonreproducibilityinparallelcomputing".Hunold,S.

CoRR,abs/1511.04217,2015.(HunoldandTräff2013)"Onthestateandimportanceofreproducible

experimentalresearchinparallelcomputing".Hunold,S.andTräff,J.S.CoRR,abs/1308.3648,2013.

(Ioannidis2005)“Whymostpublishedresearchfindingsarefalse.”Ioannidis,J.P.PLoSMedicine.August2005.DOI:10.1371/journal.pmed.0020124(Jolyetal.2012)"Openscienceandcommunitynorms:Dataretentionand

publicationmoratoriapoliciesingenomicsproject."YannJoly,EdwardS.Dove,KarenL.Kennedy,MartinBobrow,B.F.FrancisOuellette,StephanieO.M.Dyke,KazutoKato,andBarthaM.Knoppers.MedicalLawInternational.Vol12,Issue2,pp.92-120.October9,2012DOI:10.1177/0968533212458431

(Kleinetal2014)MartinKlein,HerbertVandeSompel,RobertSanderson,

HariharShankar,LyudmilaBalakireva,KeZhou,andRichardTobin.(2014)“ScholarlyContextNotFound:OneinFiveArticlesSuffersfromReferenceRot.”PLoSONE9(12):e115253.doi:10.1371/journal.pone.0115253

(Lithgowetal.2017)."Alongjourneytoreproducibleresults."Lithgow,G.J.,

Driscoll,M.,andPhillips,P.NatureNews.August2017.DOI:10.1038/548387a

(MooneyandNewton2012)Mooney,H,Newton,MP.(2012).TheAnatomyofa

DataCitation:Discovery,Reuse,andCredit.JournalofLibrarianshipandScholarlyCommunication1(1):eP1035.http://dx.doi.org/10.7710/2162-3309.1035

(Noseketal.2015)“Promotinganopenresearchculture.”B.A.Nosek,G.Alter,G.C.Banks,D.Borsboom,S.D.Bowman,S.J.Breckler,S.Buck,C.D.Chambers,G.Chin,G.Christensen,M.Contestabile,A.Dafoe,E.Eich,J.Freese,R.Glennerster,D.Goroff,D.P.Green,B.Hesse,M.Humphreys,J.Ishiyama,D.Karlan,A.Kraut,A.Lupia,P.Mabry,T.Madon,N.Malhotra,E.Mayo-Wilson,M.McNutt,E.Miguel,E.LevyPaluck,U.Simonsohn,C.Soderberg,B.A.Spellman,J.Turitto,G.VandenBos,S.Vazire,E.J.Wagenmakers,R.Wilson,T.Yarkoni.Science348,1422-1425,26June2015.DOI:10.1126/science.aab2374

Page 21: On Reproducible AI: Towards Reproducible Research, Open Science, and Digital ...gil/papers/gundersen-etal-ai... · 2019-01-21 · On Reproducible AI: Towards Reproducible Research,

(Piwowaretal2007)“SharingDetailedResearchDataIsAssociatedwithIncreasedCitationRate.”HeatherA.Piwowar,RogerS.Day,DouglasB.Fridsma.PLoSONE,March21,2007.DOI:10.1371/journal.pone.0000308

(RDA2015)OutcomesoftheResearchDataAlliance(RDA).Availablefrom

https://rd-alliance.org/outcomes.LastaccessedJuly30,2015.(Starretal.2015)“Achievinghumanandmachineaccessibilityofciteddatain

scholarlypublications.”StarrJ,CastroE,CrosasM,DumontierM,DownsRR,DuerrR,HaakLL,HaendelM,HermanI,HodsonS,HourcléJ,KratzJE,LinJ,NielsenLH,NurnbergerA,ProellS,RauberA,SacchiS,SmithA,TaylorM,ClarkT.PeerJComputerScience1:e1,2015.DOI:10.7717/peerj-cs.1

(Stoddenetal.2016)“Enhancingreproducibilityforcomputationalmethods.”

VictoriaStodden,MarciaMcNutt,DavidH.Bailey,EwaDeelman,YolandaGil,BrooksHanson,MichaelA.Heroux,JohnP.A.Ioannidis,MichelaTaufer.Science354,1240(2016)DOI:10.1126/science.aah6168

(Uhliretal.2012)“ForAttribution:DevelopingDataAttributionandCitation

PracticesandStandards.”PaulF.Uhlir,Rapporteur;BoardonResearchDataandInformation;PolicyandGlobalAffairs;NationalResearchCouncil.ReportofCODATADataCitationWorkshop.NationalAcademiesPress,2012.Availablefromhttp://www.nap.edu/catalog/13564/for-attribution-developing-data-attribution-and-citation-practices-and-standards.

(Wilkinsonetal.2016)“TheFAIRGuidingPrinciplesforscientificdatamanagement

andstewardship.”MarkD.Wilkinson,MichelDumontier,IJsbrandJanAalbersberg,GabrielleAppleton,MylesAxton,ArieBaak,NiklasBlomberg,Jan-WillemBoiten,LuizBoninodaSilvaSantos,PhilipE.Bourne,JildauBouwman,AnthonyJ.Brookes,TimClark,MercèCrosas,IngridDillo,OlivierDumon,ScottEdmunds,ChrisT.Evelo,RichardFinkers,AlejandraGonzalez-Beltran,AlasdairJ.G.Gray,PaulGroth,CaroleGoble,JeffreyS.Grethe,JaapHeringa,PeterA.C’tHoen,RobHooft,TobiasKuhn,RubenKok,JoostKok,ScottJ.Lusher,MaryannE.Martone,AlbertMons,AbelL.Packer,BengtPersson,PhilippeRocca-Serra,MarcoRoos,RenevanSchaik,Susanna-AssuntaSansone,ErikSchultes,ThierrySengstag,TedSlater,GeorgeStrawn,MorrisA.Swertz,MarkThompson,JohanvanderLei,ErikvanMulligen,JanVelterop,AndraWaagmeester,PeterWittenburg,KatherineWolstencroft,JunZhao,andBarendMons.NatureScientificData3,2016.doi:10.1038/sdata.2016.18

(w3id2018)”PermanentIdentifiersfortheWeb.”WorldWideWebConsortium

(W3C),Availablefromhttp://www.w3id.org(Zenodo2018)Zenodo.Availablefromhttps://zenodo.org.Lastaccessed18May

2018.