Bridging the gap between data centres and publishers

  • Upload
    kirk

  • View
    28

  • Download
    2

Embed Size (px)

DESCRIPTION

Bridging the gap between data centres and publishers. J. Brase ICSTI Workshop “Interactive Publications and the Record of Science February 8th, 2010. A Gap. A widening gap in the scientific record between published research and the data that underlies it Published work held by libraries - PowerPoint PPT Presentation

Citation preview

  • Bridging the gap between data centres and publishers J. BraseICSTI Workshop Interactive Publications and the Record of ScienceFebruary 8th, 2010

  • A GapA widening gap in the scientific record between published research and the data that underlies itPublished work held by librariesDatasets held by data centresNo effective way to link between datasets and articlesNo widely used method to identify datasetsNo widely used method to cite datasetsAs a result, datasets areDifficult to discoverDifficult to accessSecond-class citizens in the scientific record

  • Datasets first class citizens?DatasetsData is difficult to manage after project funding ceasesInformal networks provide the primary means of sharingOnly 21% use a national or international facilityDatasets are not included in impact analysisGood luck finding it or getting permission to use it (your discipline may vary)[Source: UKRDS Study]Published articlesLibraries ensure long-term storage and managementEstablished funded services provide the primary means of accessNearly all published articles are held in multiple national librariesArticles and citations form the backbone of impact analysisCatalogues and full-text search support discovery

  • Dataset citation using the DOI systemThe DOI system offers an easy way to connect the article with the underlying data:

    The dataset:G.Yancheva, N. R. Nowaczyk et al (2007)Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA doi:10.1594/PANGAEA.587840

    Is supplement to the article:G. Ycheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoonNature 445, 74-77doi:10.1038/nature05431

  • DataCite

  • DataCiteGlobal consortium carried by local institutionsfocused on improving the scholarly infrastructure around datasets and other non-textual informationfocused on working with data centres and organisations that hold dataProviding standards, workflows and best-practiceInitially, but not exclusivly based on the DOI systemFounded December 1st 2009 in London

  • Members

    Technische Informationsbibliothek (TIB), GermanyCanada Institute for Scientific and Technical Information (CISTI), California Digital Library, USAPurdue University, USALibrary of TU Delft, The NetherlandsTechnical Information Center of DenmarkThe British LibraryZB Med, GermanyGesis, GermanyLibrary or the ETH ZrichLInstitut de lInformation Scientifique et Technique (INIST), FranceAustralian National Data Service (ANDS)

  • DataCiteThe DataCite registration agencyMaintains the resolution infrastructureMaintains a searchable database of metadataManages the identifiers over the long termEstablishes and shares best practice

    Publishing agents (data centres, research institutes, data publishers) are responsible forQuality assurance Content storage and access Creating the identifierCreating and updating metadata

  • DataCite StructureCarriesInternational DOI FoundationDataCiteWorks withManaging Agent (TIB)MemberAssociate Stakeholder

  • Another way to see itPublishersData centres

  • Another way to see itPublishersData centres

  • Another way to see itPublishersData centres

  • Examples

  • Bridging the gapDataCite supports researchers by enabling them to locate, identify, and cite research datasets with confidence

    DataCite supports data centres by providing workflows and standards for data publication

    DataCite supports publisher by enabling linking from articles to the underlying data

    http://www.datacite.org

    Lets take a moment to contrast the services that we see today around datasets with the ones available for published articles.Most research datasets are produced during funded research projects with a well defined end. Access to the datasets is much worse after the project comes to an end. Can you imagine worrying about whether the researcher was still funded for the same work to get a copy of one of her articles!

    Many researchers are willing to share their data, but . This is like grey literature in the 80s and early 90s before the web. If you wanted a conference paper, you either had to show up in person or know someone who was there. This split research institutes and universities into haves and have-nots. You had to be on the secret paper passing network to learn about the hottest latest research. This means that almost 80% of researchers put their datasets where? On their laptops, desktops, desk drawers, departmental servers. This is not a serious way to run a serious business! The management practices vary tremendously some will have good practices, but many will not placing the data at risk. 88% said they would make data available and 43% expressed the need to access others data. This means that the researchers who produce the essential data that drive new science are often unrewarded and that data centres have considerable challenges justifying their budget and existence.

    And how is the state of resource discovery for datasets?

    UKRDS StudyData is difficult to retain and manage once project funding ceases (compare grey and published articles)Only 12% do not make their data available - but informal networks are predominant43% expressed the need to access other's data

    DataCite is building on national, disciplinary, and institutional best practices.