40
Open Access to Research Artifacts: Implementing the Next Generation Data Management Plan Victoria Stodden School of Information Sciences University of Illinois at Urbana-Champaign ASIS&T Melbourne, Australia October 21, 2019 Funding provided by NSF EAGER awards 1649545 (HMB), 1649703 (KL), and 1649555 (VCS)

Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

OpenAccesstoResearchArtifacts:ImplementingtheNextGenerationDataManagementPlan

VictoriaStoddenSchoolofInformationSciences

UniversityofIllinoisatUrbana-Champaign

ASIS&TMelbourne,AustraliaOctober21,2019

FundingprovidedbyNSFEAGERawards1649545(HMB),1649703(KL),and1649555(VCS)

Page 2: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

Agenda1. WhyaDataManagementPlanforGrantProposals?

TheU.S.NationalScienceFoundationDataManagementPlanGuidelines

2. TheNationalAcademiesofScience,Engineering,andMedicine“ReproducibilityandReplicationinScience”ReportDMPRecommendations

3. ezDMP:MachineReadabilityandInclusiveDMPs

Page 3: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

WhyaDataManagementPlan?CaseStudy:USNationalScienceFoundation:

DisseminationandSharingofResearchResultsNSFDATASHARINGPOLICYInvestigatorsareexpectedtosharewithotherresearchers,atnomorethanincrementalcostandwithinareasonabletime,theprimarydata,samples,physicalcollectionsandothersupportingmaterialscreatedorgatheredinthecourseofworkunderNSFgrants.Granteesareexpectedtoencourageandfacilitatesuchsharing.NSFDATAMANAGEMENTPLANREQUIREMENTSProposalssubmittedordueonorafterJanuary18,2011,mustincludeasupplementarydocumentofnomorethantwopageslabeled"DataManagementPlan".ThissupplementarydocumentshoulddescribehowtheproposalwillconformtoNSFpolicyonthedisseminationandsharingofresearchresults.

https://www.nsf.gov/bfa/dias/policy/dmp.jsp

Page 4: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 5: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

“ReproducibilityandReplicationinScience”ConsensusReport,May2019

NationalAcademiesofScience,Engineering,andMedicine

Page 6: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

Definitions• Theterms,“reproducibility”and“replicability”havedifferentmeaningsandusesacrossscienceandengineering,whichhasledtoconfusionincollectivelyunderstandingproblemsinreproducibilityandreplicability.Thecommitteeadoptedspecificdefinitionsforthepurposeofthisreporttoclearlydifferentiatebetweentheterms,whichareotherwiseinterchangeableineverydaydiscourse.

• Reproducibilityisobtainingconsistentresultsusingthesameinputdata,computationalsteps,methods,andcode,andconditionsofanalysis.Thisdefinitionissynonymouswith“computationalreproducibility,”andthetermsareusedinterchangeablyinthisreport.

• Replicabilityisobtainingconsistentresultsacrossstudiesaimedatansweringthesamescientificquestion,eachofwhichhasobtaineditsowndata.Twostudiesmaybeconsideredtohavereplicatediftheyobtainconsistentresultsgiventhelevelofuncertaintyinherentinthesystemunderstudy.

Page 7: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

KeyRecommendation1RECOMMENDATION4-1:Tohelpensurethereproducibilityofcomputationalresults,researchersshouldconveyclear,specific,andcompleteinformationaboutanycomputationalmethodsanddataproductsthatsupporttheirpublishedresultsinordertoenableotherresearcherstorepeattheanalysis,unlesssuchinformationisrestrictedbynon-publicdatapolicies.Thatinformationshouldincludethedata,studymethods,andcomputationalenvironment:• theinputdatausedinthestudyeitherinextension(e.g.,atextfileorabinary)orinintension(e.g.,ascripttogeneratethedata),aswellasintermediateresultsandoutputdataforstepsthatarenondeterministicandcannotbereproducedinprinciple;• adetaileddescriptionofthestudymethods(ideallyinexecutableform)togetherwithitscomputationalstepsandassociatedparameters;and• informationaboutthecomputationalenvironmentwherethestudywasoriginallyexecuted,suchasoperatingsystem,hardwarearchitecture,andlibrarydependencies(whicharerelationshipsdescribedinandmanagedbyasoftwaredependencymanagertooltomitigateproblemsthatoccurwheninstalledsoftwarepackageshavedependencieson specificversionsofothersoftwarepackages).

Page 8: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

KeyRecommendation3RECOMMENDATION6-5:Inordertofacilitatethetransparentsharingandavailabilityofdigitalartifacts,suchasdataandcode,foritsstudies,theNationalScienceFoundation(NSF)should:• Developasetofcriteriafortrustedopenrepositoriestobeusedbythescientificcommunityforobjectsofthescholarlyrecord.

• Seektoharmonizewithotherfundingagenciestherepositorycriteriaanddata-managementplansforscholarlyobjects.

• Endorseorconsidercreatingcodeanddatarepositoriesforlong-termarchivingandpreservationofdigitalartifactsthatsupportclaimsmadeinthescholarlyrecordbasedonNSF-fundedresearch.Thesearchivescouldbebasedattheinstitutionallevelorbepartof,andharmonizedwith,theNSF-fundedPublicAccessRepository.

• ConsiderextendingNSF’scurrentdata-managementplantoincludeotherdigitalartifacts,suchassoftware.

• Workwithcommunitiesreliantonnon-publicdataorcodetodevelopalternativemechanismsfordemonstratingreproducibility.Throughtheserepositorycriteria,NSFwouldenablediscoverabilityandstandardsfordigitalscholarlyobjectsanddiscourageanundueproliferationofrepositories,perhapsthroughendorsingorprovidingonego-towebsitethatcouldaccessNSF-approvedrepositories.

Page 9: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

OpenAccesstoResearchArtifacts:ImplementingtheNextGenerationDataManagementPlanVictoriaC.Stodden,KerstinLehnert,VickiFerrini,MargaretJ.Gabanyi,JohnMorton,andHelenM.Berman

FundingprovidedbyNSFEAGERawards1649545(HMB),1649703(KL),and1649555(VCS)

Page 10: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

ProjectGoals1.Forusers:TomakeDMPcreationlesstime-consuming,morecomplete,andconformwithfundingagencyguidelines.

2.Foragencies:ToobtainmachinereadableandmineableDMPs.

Infuture:▪Enableanunderstandingofcurrentcommunitypractices▪Leverageinformationtogenerateasetof“effectivepractices”

3.Fortheresearchcommunity:greaterexposureofresearchproducts,enablingincreasedre-useandreproducibility.

Page 11: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

PublicSite:ezdmp.org

Page 12: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

HowtoCreateaDMP

1. Enterproposaloverviewinformation2. Enterinproducts,one-by-one3. Connectrelatedproductstogether4. OutputDMPinPDFformat

Page 13: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

CreatingaDMPforaCISEProposal

Page 14: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 15: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 16: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 17: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 18: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 19: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 20: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 21: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 22: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 23: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 24: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 25: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 26: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 27: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

ExampleBIODMPsOnceloggedin,theuserispresentedwithalistoftheirpreviousDMPsandtheoptiontocreateanewone.

Page 28: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

Step1:EnterProposalOverview

Formcontinued…

ThiswilllinktolatestNSFpolicyforreview

Page 29: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

Step1:EnterProposalOverview

Toaddanotherresearchproduct,clickhere

Formcontinued…

Page 30: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

Step2:EnterResearchProducts

Page 31: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 32: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

Step3:ConnectRelatedProductsUserscanconnecttheirresearchdatasetstothenovelworkfloworsoftwareapplicationsthatproducedthemConnectionswillthenappearinproductdetailsandwithinreport.

Page 33: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

Step4:OutputDMPinPDFformat

“toolong”warningscreenclip

Page 34: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

DMPOutputandAdvice

Page 35: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management
Page 36: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

Really Reproducible Research“Really Reproducible Research” (1992) inspired by Stanford Professor Jon Claerbout:

“The idea is: An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete ... set of instructions [and data] which generated the figures.” David Donoho, 1998

Note: reproducing the computational steps vs re-implementing the experiment independently (both types needed).

Page 37: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

AAAS / Arnold Foundation Reproducibility Workshop III: Code and Modeling

• This workshop will consider ways to make code and modeling information more readily available, and include a variety of stakeholders.

• The computational steps that produce scientific findings are increasingly considered a crucial part of the scholarly record, permitting transparency, reproducibility, and re-use. Important information about data preparation and model implementation, such as parameter settings or the treatment of outliers and missing values, is often expressed only in code. Such decisions can have substantial impacts on research outcomes, yet such details are rarely available with scientific findings.

• http://www.aaas.org/event/iii-arnold-workshop-modeling-and-code Feb 16-17, 2016

Page 38: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

1. Share data, software, workflows, and details of the computational environment that generate published findings in open trusted repositories.

2. Persistent links should appear in the published article and include a permanent identifier for data, code, and digital artifacts upon which the results depend.

3. To enable credit for shared digital scholarly objects, citation should be standard practice.

4. To facilitate reuse, adequately document digital scholarly artifacts.

Workshop Recommendations: “Reproducibility Enhancement Principles”

Page 39: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

5. Use Open Licensing when publishing digital scholarly objects.

6. Journals should conduct a reproducibility check as part of the publication process and should enact the TOP standards at level 2 or 3.

7. To better enable reproducibility across the scientific enterprise, funding agencies should instigate new research programs and pilot studies.

Workshop Recommendations: “Reproducibility Enhancement Principles”

Page 40: Open Access to Research Artifacts: Implementing … › ~vcs › talks › ASIST-2019-STODDEN.pdfOpen Access to Research Artifacts: Implementing the Next Generation Data Management

TOP Standards