19
1 Open Access Indicator for 2015 Part 2 Technical Description of Data Foundation, Processes and Output 0 Preface ............................................................................................................................................................ 2 1 Introduction and Main Processes........................................................................................................ 3 2 Process 1: Collection of The Data ........................................................................................................ 4 2.1 The Universities Publication Data .............................................................................................. 4 2.1.1 Requirements on Universities – Metadata Format and Method of Collection4 2.1.2 This Years Universities and Their Research Databases ........................................... 5 2.2 Authority and Auxiliary Data........................................................................................................ 5 2.2.1 Directory of Open Access Journals (DOAJ) ..................................................................... 5 2.2.2 Sherpa/Romeo (Sh/Ro) ......................................................................................................... 5 2.2.3 The Danish Bibliometric Research Indicator (BFI) .................................................... 5 2.2.4 Authority List: Accepted External Repositories (”The Whitelist”) ...................... 6 2.2.5 Authority List: Journals with extended Embargo (”The Blacklist”) .................... 6 2.3 This Years Complete Data Collection ........................................................................................ 6 3 Process 2: Defining the Set of In-Scoped Publications ............................................................... 6 3.1 The Set of Scoped Records Including Duplicates ................................................................. 7 3.2 The Set of Scoped Records Excluding Duplicates ................................................................ 8 3.3 This Years Sets of Scoped Records ............................................................................................. 9 4 Process 3: Calculation of OA Realization and Potential ............................................................. 9 4.1 Open Access Classification – University Level ................................................................... 10 4.1.1 Checking for Golden Open Access Potential ............................................................... 11 4.1.2 Checking for Green Open Access Potential ................................................................. 11 4.1.3 Checking for Unused & Unclear Potential ................................................................... 14 4.1.4 Checking Open Access Potential – Combined ............................................................ 14 4.2 Open Access Classification – National and Main Research Area Level .................... 15 5 Process 4: Quality Assurance ............................................................................................................. 16 6 Process 5: Output .................................................................................................................................... 17 6.1 Data Reports for download ........................................................................................................ 17 6.2 Web Dissemination via The Danish Research Database................................................ 18 7 Appendix A: The Fulltext Download Sub Process ..................................................................... 19 Revision 3 of 11 April 2017

Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

1

OpenAccessIndicatorfor2015

Part2

TechnicalDescriptionofDataFoundation,ProcessesandOutput

0 Preface............................................................................................................................................................21 IntroductionandMainProcesses........................................................................................................32 Process1:CollectionofTheData........................................................................................................42.1 TheUniversitiesPublicationData..............................................................................................42.1.1 RequirementsonUniversities–MetadataFormatandMethodofCollection42.1.2 ThisYearsUniversitiesandTheirResearchDatabases...........................................5

2.2 AuthorityandAuxiliaryData........................................................................................................52.2.1 DirectoryofOpenAccessJournals(DOAJ).....................................................................52.2.2 Sherpa/Romeo(Sh/Ro).........................................................................................................52.2.3 TheDanishBibliometricResearchIndicator(BFI)....................................................52.2.4 AuthorityList:AcceptedExternalRepositories(”TheWhitelist”)......................62.2.5 AuthorityList:JournalswithextendedEmbargo(”TheBlacklist”)....................6

2.3 ThisYearsCompleteDataCollection........................................................................................63 Process2:DefiningtheSetofIn-ScopedPublications...............................................................63.1 TheSetofScopedRecordsIncludingDuplicates.................................................................73.2 TheSetofScopedRecordsExcludingDuplicates................................................................83.3 ThisYearsSetsofScopedRecords.............................................................................................9

4 Process3:CalculationofOARealizationandPotential.............................................................94.1 OpenAccessClassification–UniversityLevel...................................................................104.1.1 CheckingforGoldenOpenAccessPotential...............................................................114.1.2 CheckingforGreenOpenAccessPotential.................................................................114.1.3 CheckingforUnused&UnclearPotential...................................................................144.1.4 CheckingOpenAccessPotential–Combined............................................................14

4.2 OpenAccessClassification–NationalandMainResearchAreaLevel....................155 Process4:QualityAssurance.............................................................................................................166 Process5:Output....................................................................................................................................176.1 DataReportsfordownload........................................................................................................176.2 WebDisseminationviaTheDanishResearchDatabase................................................18

7 AppendixA:TheFulltextDownloadSubProcess.....................................................................19

Revision3of11April2017

Page 2: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

2

0 PrefaceTheNationalSteeringGroupforOpenAccess1hasproposedtheDanishAgencyforScience,TechnologyandInnovationandDenmark’sElectronicResearchLibrary,todevelopaDanishOpenAccessIndicator.TheintentionistosupporttheimplementationofthenationalOpenAccessstrategy2-cf.thestrategy’sstatementonmonitoring:”TheimplementationofOpenAccessistobemonitoredonanongoingbasistoensurethatallpartiesmakeamaximumefforttodevelopanddisseminatefreeaccessibilitytoDanishresearchfindings.”TheOpenAccessIndicatoriscalculatedonceperyearwiththetargetfield:ScientificandpeerreviewedarticlesandconferencecontributionsinjournalsandproceedingswithISSN.InthecontextofHorizon20203,EUrequiresthatOpenAccessbeestablishedwithinatmost6monthsafterpublicationfortheareasofscience,technologyandhealthandwithinatmost12monthsforthesocialsciencesandhumanities.Thisdelayiscausedbymanyjournalsmaintainingso-calledembargoperiods,wheretheyexcluderesearchersfromestablishingOpenAccesstothearticlesbeforetheendoftheembargoperiod.AstheOAIndicatoriscalculatedonceannuallyforallpublicationswithinitstargetfield,itisdesignedtoacceptaone-yeardelayinOpenAccesstothepublications.Consequently,theOAIndicatorfor2015iscalculatedearlyMarch2017inordertoaccommodateafullyearembargoperiodalsoforpublicationsfromDecember2015.InpracticethismeansthatpublicationsfromJanuary2015couldhaveembargoperiodsallthewayupto24monthsandstillbecreditedbytheOAIndicator.ThedescriptionoftheOpenAccessIndicatorisorganizedintwoparts:

• Part1:Overviewofdatafoundation,processesandoutput• Part2:Technicaldescriptionofdatafoundation,processesandoutput

Note:InPart2,thetechnicaldescription,thenotionoftheindicator’s“targetfield”isexpressedusingtheterm“setofscopedrecords”.Queriesregardingtheindicatormaybedirectedto

AdamBaden/Hanne-LouiseKirkegaardDanishAgencyforScienceandHigherEducationMinistryofHigherEducationandScienceBredgade40DK-1260KøbenhavnKEmail:[email protected]/[email protected]

1http://ufm.dk/en/research-and-innovation/cooperation-between-research-and-innovation/open-access2http://ufm.dk/en/research-and-innovation/cooperation-between-research-and-innovation/open-access/Publications/denmarks-national-strategy-for-open-access3https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf

Page 3: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

3

1 IntroductionandMainProcessesTheactivitiesoftheOAIndicatorcanbebrokendownintothesefivemainprocesses.

Thefivemainprocessesaredescribedinfurtherdetailinthesectionsbelow.ThisdescriptionoftheOpenAccessIndicatorisaimedforatechnicallyinclinedaudienceandaimstodescribeindepthhowtheIndicatorworks–overallaswellasindetail.ThedescriptionassumesthatthereaderhasfamiliaritywithbasicXML4andbasicpartsoftheXPath5notationforreferingtoXMLelementsofanXMLdocumentconformingtoacertainXMLSchema.Italsoassumesthatthereaderisfamiliarwithvisualisationofprocessesafworkflowdiagrams6.

4https://www.w3.org/TR/xml/5https://www.w3.org/TR/xpath-30/6https://en.wikipedia.org/wiki/Flowchart

Page 4: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

4

2 Process1:CollectionofTheData

ThefirstactivityintheOAIndicatoristhecollectionofthecompletedatafoundationusedbytheindicator.Thisincludesimportingsixnationalandinternationalsources.Thedatafoundationiscomposedofmetadatadescribingthepublicationsoftheuniversities,aswellasauthority-andauxiliarydata.

2.1 TheUniversitiesPublicationDataMetadatadescribingthepublicationsoftheuniversitiesareusedtoestablishthesetofpublicationsinscopeoftheOAIndicator.MetadatadescribingthepublicationsoftheuniversitiesarecollectedfortheOAIndicatoronceannually.Collectionisdonedirectlyfromtheuniversities,usinganXML-basednationallyagreedexchangeformatandanationallyagreedexchangeprotocol.Forfulltextsregisteredinthecollectedpublicationmetadata,collection(download)areattempted.

2.1.1 RequirementsonUniversities–MetadataFormatandMethodofCollectionAuniversitycanbeincludedintheOAIndicatorifitmeetsthefollowingminimumrequirements:

• Publicationspublishedbyresearchersemployedattheuniversityarecollectedinauniversityresearchdatabasecontainingpublicationdata,persondata,projectdataetcofthatparticularuniversityonly.

• ThisresearchdatabaseoftheuniversitymustexposeitspublicationdatausingOAI-PMH(http://www.openarchives.org/OAI/openarchivesprotocol.html).

• TheresearchdatabasemustsupportOAI-PMHselectiveharvestingusingSets,characterisedbytheirsetSpec(code),toharvestonlypartsofthedatabase.

• AdedicatedOAI-PMHSetexposingallpublicationdataheldintheresearchdatabasemustexist.

• Forthisdedicatedset,OAI-PMHmetdataPrefix”ddf_mxd”mustbesupported.• WhenanOAI-PMHclientharvestthisdedicatedsetusingmetadataPrefix

”ddf_mxd”,metadatarecordsmustbevalidDDF-MXD(http://mx.forskningsdatabasen.dk/mxd/).

Page 5: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

5

2.1.2 ThisYearsUniversitiesandTheirResearchDatabasesThefollowing8universities–andassociatedresearchdatabases–areincludedintheOAIndicatorfor2015:University ResearchDatabase-OAI-PMHserver OAI-PMHsetSpecAAU http://vbn.aau.dk/ws/oai publications:allAU https://pure.au.dk/ws/oai publications:allCBS http://research.cbs.dk/ws/oai publications:allDTU http://orbit.dtu.dk/ws/oai publications:allITU https://pure.itu.dk/ws/oai publications:allKU http://curis.ku.dk/ws/oai publications:allRUC http://rucforsk.ruc.dk/ws/oai publications:allSDU http://heinz.sdu.dk:8080/ws/oai publications:all

2.2 AuthorityandAuxiliaryDataAuthorityandAuxiliaryDataarecollectedfortheOAIndicatorfromvarioussources.Foreachofthesesources,thecollectionisdoneonceannually.Collectionmethodanddataformatsvaryacrosssources.

2.2.1 DirectoryofOpenAccessJournals(DOAJ)DOAJisusedbytheOAIndicatorasanauthorativelistofGoldenOpenAccessJournals.Parametersofthedatacollection:

• Protocol:OAI-PMH(serverhttp://www.doaj.org/oai/)• metadataPrefix:oai_dc• Dataformat:DublinCore(http://dublincore.org/documents/dces/)

2.2.2 Sherpa/Romeo(Sh/Ro)Sh/RoisusedbytheOAIndicatortodeterminethepolicyforGreenOpenAccessbyjournals,andtherebytheOpenAccesspotentialofindividualjournalarticles.Parametersofthedatacollection:

• Protocol:HTTP(GETfromhttp://www.sherpa.ac.uk/downloads/)• Dataformat:ProprietaryXML-basedformat(http://sherpa.ac.uk/news/2012-10-08-

RoMEO-API-News.html)

2.2.3 TheDanishBibliometricResearchIndicator(BFI)DatafromBFIareusedbytheOAIndicatorforthreepurposes:

• Toidentifyduplicatepublicationdataacrossuniversities(existsforcollaborativepublicationswithcoauthorsemployedatdifferentuniversitiesandthereforeregisteredinmultipleresearchdatabases)

• Toresolvepotentialconflictswrt.MainResearchAreasregisteredinthemetadataforthepublications

• ToensurethatarticlespublishedinDOAJ-validatedjournalscanbeconsideredscientificandpeer-reviewed(BFI-level1or2).

Parametersofthedatacollection:• Protocol:HTTPS(GETfromhttps://bfi.fi.dk/AnnualReport)• Format:CompressedExcelspreadsheet–undocumentedtemplate

Page 6: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

6

2.2.4 AuthorityList:AcceptedExternalRepositories(”TheWhitelist”)Forfulltextsdepositedinexternalrepositories,thisauthoritylistisusedbytheOAIndicatortoonlyallowfulltextsdepositedinacceptedexternalrepositoriestodemonstrateRealisedOpenAccessPotential.

• Protocol:Mail(fromAuthoritylistmaintainers)• Format:ExcelSpreadsheet–undocumentedtemplate

2.2.5 AuthorityList:JournalswithextendedEmbargo(”TheBlacklist”)TheauthoritylistisusedbytheOAIndicatortoreclassifyfromUnusedtounclearOpenAccessPotentialforjournalsregisteredonthelist.

• Protocol:Mail(fromAuthoritylistmaintainers)• Format:ExcelSpreadsheet–undocumentedtemplate

2.3 ThisYearsCompleteDataCollectionSummaryofthedatacollectionfortheOAIndicatorfor2015:Source Protocol Ver. Format Ver. CollectionDate RecordsAAU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 7248*AU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 13221*CBS OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 2118*DTU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 7740*ITU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 280*KU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 13845*RUC OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 1550*SDU OAI-PMH 2.0 DDF-MXD 1.3.0 6/3–2017 7327*DOAJ OAI-PMH 2.0 DC % 6/3–2017 13515Sh/Ro HTTP % Proprietary % 6/3–2017 27032BFI HTTPS % Proprietary % 6/3-2017 25044Whitelist Mail % Proprietary % 26/1-2017 15Blacklist Mail % Proprietary % 14/12-2016 2945

*WithSubmissionYear2015

3 Process2:DefiningtheSetofIn-ScopedPublications

AfterthecollectionofalldatafortheOAIndicator,anumberofactivitiesareinitiatedinordertoisolatethepublicationrecordswhichareinscopefortheOAIndicator.Notallpublicationsareinscope–onlyasubsetofthepublicationsoftheuniversities.

Page 7: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

7

Thescopeisdefinedas:

• Scientific,peer-reviewedarticlesandconferencecontributionspublishedinjournalsorproceedingswithISSN

Thus,thesubsetofpublicationmetadatarecordsrepresentingthisscopemustbeisolatedfromthetotalsetofpublicationmetadatacollected.Thsisdoneintwoways,inordertofacilitatestatisticsonthenationallevelandontheuniversitylevel:

• Scopedrecordsincludingduplicates–forstatisticsontheuniversitylevelForcollaborativearticlesacrossuniversities,allregistrationsfromallparticipatinguniversitiesarekept

• Scopedrecordsexcludingduplicates–forstatisticsonthenationallevelForcollaborativearticlesacrossuniversities,onlyoneregistrationiskept.

3.1 TheSetofScopedRecordsIncludingDuplicatesEachoftherequirementsinthedefinitionofthescopemapsnicelytoacorrespondingruleregardingDDF-MXDdataelementsandtheircontent.Thesetofscopedpublicationmetadatarecordsarethereforethesetthatcompliestoalltherules.Therulesaredescribedbelow.Firstofall,thesetofscopedrecordsmustrepresentrecordswithagivensubmissionyear.Initialruleistherefore:

0) Thesubmissionyear(indberetningsår)mustbemarkedupinthepublicationmetadatarecordwiththegivenvalue.Ruleapplied:Attribute/ddf_doc/@doc_yearhavethevalue(year)fortheOAindicatorcalculation

Subsequently,thefollowingfourrulesareappliedonallrecords:

1) Thetypeofthepublicationmustbemarkedupinthepublicationmetadatarecordas”JournalArticle””Reviewarticle”or”ConferenceContribution”(samedefinitionof“article”asusedbyBFI).Ruleapplied:Attribute/ddf_doc/@doc_typehasvalue“dja”,“djr”or“dcp”.

2) Thereview-statusofthepublicationmustbemarkedupinthepublicationmetadatarecordas“Peer-review”(similardemandasforBFI).Ruleapplied:Attribute/ddf_doc/@doc_reviewhasvalue“pr”.

3) Thescientificlevelofthepublicationmustbemarkedupinthepublicationmetadatarecordas“Scientific”(similardemandasforBFI).Ruleapplied:Attribute/ddf_doc/@doc_levelhasvalue“sci”

4) ThepublicationchannelofthepublicationmustbemarkedupinthepublicationmetadatarecordwithanISSN.Ruleapplied:Element/ddf_doc/publication/*/issnhasvalue.

Page 8: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

8

3.2 TheSetofScopedRecordsExcludingDuplicatesForcollaborativepublicationsbetweentheuniversities,multiplepublicationmetadatarecordsmayrepresentthesamepublication.Asthisisimpracticalwhenproducingstatisticsonthenationallevel,asetofscopedrecordswithoutduplicatesareproduced.Thissetisproducedbyexposingthesetofscopedrecordswithduplicatestoadeduplicationprocess.Theambitionofthisprocessistoensure,thatforeachpublicationinthescopeoftheOAIndicatorandforwhichthereisatleastonerecordinthesetofscopedrecordsincludingduplicates,thereisexactlyonerecordinthesetofscopedrecordsexcludingduplicates.Thededuplicationprocescreatesclustersofrecords.Aclustercontainsrecordsthatrepresentsthesamepublication.Thefullsetofscopedrecordsexcludingduplicatesisultimatelyestablishedbyproducingonerecordpercluster.Thealgorithmforproducingclustersis:

1) RecordsthatwerepartoftheBFIcalculationforthesamesubmissionyearandwereidentifiedbytheBFIprocessasbeingduplicates,areaddedtothesamecluster

2) Recordsforwhichsignificantmetadataelements(DOI,title,subtitle,ISSN,publicationyear,etc.)matchessufficientlywell,areconsideredtorepresentthesamepublicationandareaddedtothesamecluster

ThisalgorithmrespectsBFI’sdeduplicationalgorithm:Rule(1)ensuresthatanyrecordsidentifiedbyBFIasduplicatesarealsoidentifiedbytheOAIndicatorasduplicates.ThescopeofBFIandthescopeoftheOAIndicatordiffer.Thismakesitrealisticthatothernon-BFI-scopedrecordsarepartoftheOAIndicatorscopeandareindeedduplicatestootherrecords.Rule(2)ensures,thattheserecordsareinfact(besteffort)beingfathomedintoclustersaswell.Thus,clustersmayinclude

a. OnlyrecordswhichwerepartofBFI,b. BothrecordswhichwerepartofBFIandrecordswhichwerenot,orc. OnlyrecordswhichwerenotpartofBFI.

Asubtlebutimportantremark:ForclusterscontainingBFIrecords-(a)and(b)above–theBFIrecordsclusteredbyrule(2)abovemaystemfromdifferentBFIclusters.OAIndicatorclustersmaycontainBFIrecordswhichwerenotjoinedbytheBFIdeduplicationalgorithm.ConflictResolutionTheresultsoftheOAIndicatoraredistributedonMainResearchArea(MRA).Inordertobeabletodothisdistribution,eachclustermusthaveauniqueMainResearchArea.BFI’sdefinitionofMRAisusedbytheOAIndicator:

• Science(sci)• SocialScience(soc)• Humanities(hum)

Page 9: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

9

• Medicine(med)AllDDF-MXDrecordscontainauniqueMRA.Forrecordsinthesetofscopedrecordsincludingduplicates,theseMRA’sareused.Forrecordsinthesetofscopedrecordsexcludingduplicates,recordsintheunderlyingclustersmaydisagreeonMRA.UsingBFIterminology,suchasituationiscalledanMRA-conflict.SuchMRA-conflictsmustberesolvedsoeachclusterhasauniqueMRA.ThealgorithmforresolvingMRA-conflictsinaclusterare:

1) IfalltherecordsinaclusterhavethesameMRA,thisisusedforthecluster(noconflict)

2) Otherwise,ifoneormoreoftherecordsintheclusterwerepartofaBFIcluster,theBFIMRAforthatclusterisused.

3) IfnoneoftherecordsintheclusterwerepartoftheBFIcalculation–orifmultiplerecordswerepartofdifferentBFIclustersdiagreeingontheirBFIMRAforthoseBFI-clusters–majoritywins:TheMRAoftheclusteristheMRArepresentedbymostoftherecordsinthecluster.

4) IftwoormoreMRA’sarerepresentedbythesamenumberofrecordsinthecluster,theMRAwiththehighestrepresentationintheentiresetofscopedrecordsischosenforthecluster.

Thisalgorithmensures,thattheOAIndicatorsolvespotentialMRA-conflictsrespectingtothelargestextendpossiblethecorrespondingMRA-conflictresolutionsdonebyBFI.

3.3 ThisYearsSetsofScopedRecordsDataset RecordsTotalnumberofpublicationrecordscollectedfromtheuniversities 53.429Setofscopedrecordsincludingduplicates 25.070Setofscopedrecordsexcludingduplicates 22.666Forfurtherdetails,seesectiononDatareports.

4 Process3:CalculationofOARealizationandPotential

Page 10: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

10

ThecalculationofOArealisationandpotentialaredonerespectingGreenandGoldenOpenAccess.Thecalculationisdonenationally,distributedonMainResearchArea(MRA)anddistributedonuniversities.TheOpenAccesspotential–andtherealisationofthat–isinitiallycalculatedperuniversity,usingaper-publicationapproachbasedonthesetofscopedrecordsincludingduplicates.Subsequently,itisalsocalculatedforthenationallevelandMRAlevel,alsousingaper-publicationapproach,butbasedonthesetofscopedrecordsexcludingduplicatesForbothsets,eachrecord/publicationbelongingtothesetisclassifiedaccordingtohowthepublicationrealiseitsOpenAccesspotential.Therearethreevaluesforthisclassifications,andtheyarecolorcodedusinggreen,yellowandred(trafficlight):

• RealisedOpenAccesspotential• UnusedOpenAccesspotential,and• UnclearOpenAccesspotential

Forsomein-scopedrecords,theclassificationincludesattemptingadownloadofafulltextregisteredintherecord.Fortechnicalreasons,theactualdownloadattemptsofallpotentialfulltextsarethefirstsubprocess.PleaserefertoAppendixAfortechnicaldetailsonhowthisisdone.

4.1 OpenAccessClassification–UniversityLevelForanyrecordinthesetofscopedrecordsincludingduplicates,theOpenAccesspotentialisestablishedthroughanumberofvalidationsteps.Asanoverview,theclassificationprocesscanbeillustratedasfollows:

Page 11: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

11

Pleasenote,thatalthoughthediagramaboveindicatesthatvalidationforGoldenandGreenOpenAccesstakesplaceinparallel,theactualimplementationis,thatGoldenisvalidatedbeforeGreen.Eachofthestepsillustratedaboveareworkflowsoftheirown.Theyaredescribedindividuallybelow.

4.1.1 CheckingforGoldenOpenAccessPotentialFirst,thejournalregisteredinthepublicationmetadatarecordischeckedagainstDOAJ.Ifpresent,andifthepublicationrecordachievedalevel1orlevel2BFIclassification,thepublicationisconsideredonewitha(Golden)OpenAccesspotential,andthepotentialisconsideredtobeRealised.Theassociated–simple-workflowcanbedepictedasfollows:

4.1.2 CheckingforGreenOpenAccessPotentialGreenOpenAccessvalidationofapublicationrecordinvolvesinspectingtheelement/ddf_doc/oa_link.Below,itwillbereferredtowiththeshorthandnotation//oa_link.Recordsmaycontainzero,oneormore//oa_linkelements.ThecombinedworflowforvalidatingGreenOpenAccessisasfollows:

Page 12: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

12

Threedecisionsinthisworkflowhastodowithqualification.Thesethreedecisionsaremadefollowingsub-workflows:Decision://oa_linkelementqualify?Aqualified//oa_linkelementisa//oa_linkelement

• withattribute@typehavinganacceptablevalue(”loc”forlocalor”rem”forremote”–not”doi”forDOI),and

• witha@urlattributethathasavalue.Checkingforqualificationcanbeillustratedwiththefollowingworkflow:

Page 13: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

13

Decision:DoesURLqualify?AqualifiedURLiseitheraURLtoalocalrepositoryoraURLtoanexternalrepositorythathasaprefix(domainnameandpotentiallyalsopath)registeredforarepositoryonthelistofacceptedexternal(/remote)repositories(theWhitelist).Checkingforqualificationcanbeillustratedwiththefollowingworkflow:

Decision:DoesFilequalify?Aqualifiedfileisafilethat

• canbedownloadedbyacomputer• wherethecontentofthedownloadedfilehassizebiggerthanzero

Checkingforqualificationcanbeillustratedwiththefollowingworkflow:

Page 14: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

14

4.1.3 CheckingforUnused&UnclearPotentialIftherecordhasnoRealisedOpenAccessPotential,therecordisexaminedtodetermineifthepotentialisUnusedorUnclear.TheOpenAccesspotentialofthepublicationisderivedfromthetheOpenAccesspotentialofthejournalregisteredinthepublicationmetadatarecord,asregisteredintheSherpa/Romeadataset(c.f.http://www.sherpa.ac.uk/romeoinfo.html).

Rulesapplied:

• IftheISSNofthejournalisregisteredinSherpa/Romeowithcolorcodegreen,blueoryellow,thejournalisconsideredonewithOpenAccessPotential,andthepublicationmetadatarecordisconsideredonewithanUnusedOpenAccesspotential.

o AnExceptiontothisruleis,iftheISSNisregisteredonthelistofacceptedjournalswithextendedembargoperiods(theBlacklist).Ifso,therecordisreclassifiedtoUnclear

• IfthejournalisregisteredinSherpa/Romeowithadifferentcolorcodeornotregisteredatall,thejournaldoesnothaveaclearOpenAccesspotential,andthepublicationmetadatarecordisconsideredtobeonewithanUnclearOpenAccesspotential.

Thisvalidationcanbedepictedasfollows:

4.1.4 CheckingOpenAccessPotential–CombinedThus,thecombineddecissionworkflowfordeterminingtheOpenAccesspotentialofarecordis:

Page 15: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

15

4.2 OpenAccessClassification–NationalandMainResearchAreaLevelPublicationmetadatarecordsinthesetofscopedrecordsexcludingduplicatescorrespondtoclustersofoneormorerecordsfromthesetofscopedrecordsincludingduplicates.AfterclassifyingeachoftherecordsofthesetofscopedrecordsincludingduplicatesaccordingtoOpenAccesspotentialanditsrealization,clustersinheritclassificationsaccordingtoa”best-classification-wins”algorithm,usingthefollowingdecisionworkflow:

Page 16: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

16

5 Process4:QualityAssurance

TheresultsoftheOpenAccessIndicatorhavebeensubjectedtothefollowingqualityassurancemeasures:

• DataFoundation.Thecollecteddataandtheregisteredlinkstofulltextsandtheirresolvabilitybacktotheuniversitiesresearchdatabases,hasbeentested.Thetestshavebeenbasedonsamplingacrosstheuniversities.

• Downloadedfulltextfiles.Thecollecteddataandtheregisteredlinksandtheirresolvabilitybacktotheuniversitiesresearchdatabases,hasbeentested.Aselectionofthedownloadedfulltextfileshavebeeninspectedtoensurethattheycanindeedbeconsideredfilesrepresentingthescientificarticle–inacompleteandreadablefashion.Thetesthavefocusedonfilesthat,basedonsimplecomputerbasedanalysis,couldseemtodeviatesuspiciouslyfromthemetadataregisteredforthepublication(pagenumber,filesizes,etc.)

Page 17: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

17

• LinkstoexternalOArepositories.Allfiles,realizedthroughlinkstorecognizedexternalOArepositories,havebeeninspectedinordertoensurethatthelinksleadtoafulltextfilerepresentingthescientificarticle.

• Randomsample.Arandomsampleof5%fromthetotalsetofrealizedOpenAccesspotential,fromeachuniversity,hasbeeninspectedwiththeaimofvalidatingtheoveralldataquality

6 Process5:Output

Asoutput,theOpenAccessIndicatorproduceanumberofdatareportsaswellasweb-friendlyvisualisationsofthesummationsofthese.TheDanishResearchDatabase(http://forskningsdatabasen.dk/)isusedasdisseminationplatformforthevisualisationsandthereports.

6.1 DataReportsfordownloadFivedatareportsareproduced:

1) Summations::Thesetsofscopedrecords,aggregatedanddistributedonRealized,UnusedandUnclearOpenAccesspotential

a. Nationaly(setofscopedrecordsexcludingduplicates)b. DistributedonMainResearchArea(setofscopedrecordsexcluding

duplicates)c. Distributedontheuniversities(setofscopedrecordsincludingduplicates)

2) Detailedfoundationfor(a)and(b):Totallistofpublicationrecordsinthesetof

scopedrecordsexcludingduplicates

3) Detailedfoundationfor(c):Totallistofpublicationrecordsinthesetofscopedrecordsincludingduplicates

4) Thelistofacceptedexternalrepositories(TheWhitelist)usedforthecalculation

5) Thelistofacceptedjournalswithextendedembargoes(TheBlacklist)usedforthecalculation

Page 18: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

18

6.2 WebDisseminationviaTheDanishResearchDatabaseThesummationsoftheOpenAccessIndicatorarevisualisedonhttp://forskningsdatabasen.dk/en/open_access/overview,fromwheredatareportscanbedownloadedaswell.

Page 19: Open Access Indicator 2015 Technical en - Danish National … · 2020-05-05 · 2 0 Preface The National Steering Group for Open Access1 has proposed the Danish Agency for Science,

19

7 AppendixA:TheFulltextDownloadSubProcessAllthefulltextsregistered(byitsURL)inthescopedsetofpublicationmetadatarecordsareattempteddownloadedinasinglesubprocess.Thissubprocessisimplementedinthefollowingway:

• Fulltextsaredownloadedonebyone(serial;notinparallel)

• Fulltextsaredownloadedina”UniversityRoundRobin”fashion:o onefulltextfromuniversity1o onefulltextfromuniversity2,o onefulltextfromuniversity3,o …,o onefulltextfromuniversityN,o onefulltextfromuniversity1,o onefulltextfromuniversity2,o …,o onefulltextfromuniversityN,o …o …

AlldownloadsaredoneautomaticallybytheOAIndicatordownloadrobot.Anyrepositoryholdingthefulltexts(eithertheresearchdatabasesoftheuniversitiesorexternalrepositories)canidentifyadownloadbytheOAIndicatorrobotby:

• IPaddress:192.38.67.38