39
A new approach to organization of research data Ya-Ning Chen Tamkang University 6 June, 2016

A new approach to organization of research data

Embed Size (px)

Citation preview

Page 1: A new approach to organization of research data

A new approach to organization of research data

Ya-Ning ChenTamkang University

6 June, 2016

Page 2: A new approach to organization of research data

Outline

• Background: research data, DMP, RDM, and data• Why is research data so important• Why is not metadata• What is data journal• What is data paper• Concept map of data journal and data paper• Current status of data journals and data papers• A survey

• Methodology, results and discussion• Conclusion

Page 3: A new approach to organization of research data

Research data-Definition-01

• Research data means data in the form of facts, observations, images, computer program results, recordings, measurements or experiences on which an argument, theory, test or hypothesis, or another research output is based. Data may be numerical, descriptive, visual or tactile. It may be raw, cleaned or processed, and may be held in any format or media. (Queensland University of Technology, http://www.mopp.qut.edu.au/D/D_02_08.jsp)

Page 4: A new approach to organization of research data

Research data-Definition-02

• Data that are used as primary sources to support technical or scientific enquiry, research, scholarship, or artistic activity, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results. All other digital and non-digital content have the potential of becoming research data. Research data may be experimental data, observational data, operational data, third party data, public sector data, monitoring data, processed data, or repurposed data (RDC Glossary).

Page 5: A new approach to organization of research data

Research data-Definition-03

• Data are facts, observations or experiences on which an argument, theory or test is based. Data may be numerical, descriptive or visual. Data may be raw or analysed, experimental or observational. Data includes: laboratory notebooks; field notebooks; primary research data (including research data in hardcopy or in computer readable form); questionnaires; audiotapes; videotapes; models; photographs; films; test responses. Research collections may include slides; artefacts; specimens; samples. Provenance information about the data might also be included: the how, when, where it was collected and with what (for example, instrument). The software code used to generate, annotate or analyse the data may also be included. (University of Melbourne, https://policy.unimelb.edu.au/MPF1242)

Page 6: A new approach to organization of research data

DMP

• Data Management PlanA formal statement describing how research data will be managed and documented throughout a research project and the terms regarding the subsequent deposit of the data with a data repository for long-term management and preservation. (RDC Glossary)

Page 7: A new approach to organization of research data

RDM

• Research Data ManagementData Management refers to the storage, access and preservation of data produced from a given investigation. Data management practices cover the entire lifecycle of the data, from planning the investigation to conducting it, and from backing up data as it is created and used to long term preservation of data deliverables after the research investigation has concluded. Specific activities and issues that fall within the category of Data Management include: File naming (the proper way to name computer files); data quality control and quality assurance; data access; data documentation (including levels of uncertainty); metadata creation and controlled vocabularies; data storage; data archiving & preservation; data sharing and re-use; data integrity; data security; data privacy; data rights; notebook protocols (lab or field).(RDC Glossary)

Page 8: A new approach to organization of research data

Research data-Importance

Source: http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf

Page 9: A new approach to organization of research data

Data-Definition and Type-01

• Primary sources• Secondary sources• Tertiary sources

Page 10: A new approach to organization of research data

Primary sources

• Primary sources allow researchers to get as close as possible to original ideas, events and empirical studies as possible. Such sources may include expositions of creative ideas, first hand or contemporary accounts of events, publication of the results of empirical observations or studies, and other items that may form the basis of further research.• Examples include:• Novels, plays, poems, works of art, popular culture• Diaries, narratives, autobiographies, memoirs, speeches• Government documents, patents• Data sets, technical reports, experimental research results

Source: Univ. of Penn. Libraries, 2013

Page 11: A new approach to organization of research data

Secondary sources

• Secondary sources analyze, review or restate information in primary resources or other secondary resources. Even sources presenting facts or descriptions about events are secondary unless they are based on direct participation or observation. Moreover, secondary sources often rely on other secondary sources and standard disciplinary methods to reach results, and they provide the principle sources of analysis about primary sources.• Examples include:

• Biographies• Review articles and literature reviews• Scholarly articles that don't present new experimental research results• Historical studies

Source: Univ. of Penn. Libraries, 2013

Page 12: A new approach to organization of research data

Tertiary sources

• Tertiary resources provide overviews of topics by synthesizing information gathered from other resources. Tertiary resources often provide data in a convenient form or provide information with context by which to interpret it.• Examples include:• Encyclopedias• Chronologies• Almanacs• Textbooks

Source: Univ. of Penn. Libraries, 2013

Page 13: A new approach to organization of research data

Data-Definition and Type-01

Source: Lavoie et al. (2014). The evolving scholarly record.

Page 14: A new approach to organization of research data

Data-Definition and Type-02

Source: Lavoie et al. (2014). The evolving scholarly record.

Formal scholarly output

Social media

Research Data

Page 15: A new approach to organization of research data

Significance of documentation for research data• The cost of recollecting or reproducing data is much more than

documenting data, although data documentation is time consuming and costly (Kratz & Strasser, 2014).• Some data cannot be recollected or reproduced.

Page 16: A new approach to organization of research data

Why is metadata not appropriate for RD

• “metadata may not be sufficient to enable them to use the data” (Costello, 2009, p. 4).• The reason may be that a “research methods” description is not

included in most metadata standards or guidelines (Chao, 2015).

Page 17: A new approach to organization of research data

Data journal

• Data journals• “data publication” concept: by mirroring the scientific publication model.

They promote the publication of data papers, “scholarly publication of a searchable metadata document describing a particular online accessible data set, or a group of data sets, published in accordance to the standard academic practices” (Chavan & Penev, 2011)• the final aim being to provide “information on the what, where, why, how, and

who of the data” (Callaghan et al., 2012, p. 112).

Page 18: A new approach to organization of research data

Data papers-01

• Data papers are concisely written articles that describe publicly available data/data sets. (Earthquake Spectra)• A data paper is essentially a one or two page description of a publically

available dataset which spells out its re-use potential. In addition to the dataset itself receiving a digital object identifier (DOI), an associated data paper receives a DOIs of its own. This means either can be cited in an academic context. (Gray, 2015)• A data paper resembles a traditional article except that instead of forming

an argument or drawing conclusions from data, it provides a detailed description of dataset, including how the data were collected, processed, an/or analyzed. (Akers, 2013)

Page 19: A new approach to organization of research data

Data papers-02

• A data paper is a publication whose primary purpose is to expose and describe data, as opposed to analyze and draw conclusions from it. The data paper enables a division of labor in which those possessing the resources and skills can perform the experiments and observations needed to collect potentially interesting data sets, so that many parties, each with a unique background and ability to analyze the data, may make use of it as they see fit. (Rees, 2010)• A data paper, which would describe the dataset, providing information on the

what, where, why, how and who of the data. The data paper would contain a link back (a DOI) to the dataset in its repository, and the journal publishers would not actually host the data. This means that even in situations where the data paper might be restricted access, the dataset could still be open.” (Candela et al., 2015, p. 1748)

Page 20: A new approach to organization of research data

Concept map of data journal and data paper

Source: Candela et al., 2015, p. 1752

Page 21: A new approach to organization of research data

Silos

• Some data journals have also defined structural categories in their templates or guidelines for data papers• There is not a common standard for all data papers across various

communities (Candela et al. 2015, 1753-4; Callaghan et al. 2014; Chavan and Penev 2011, 3; Smith 2009, 2).

Page 22: A new approach to organization of research data

A survey of data papers

• 16 publishers• 26 data journals• Content analysis• Templates or guidelines and online

websites and their instances• Variant terms

• including database article, data paper, data note, data article, data descriptor, data in brief, data original article, database paper, dataset paper, and genome database (Candela et al. 2015)

• Software papers

• Domain: checked from Ulrichsweb• Archaeology• Biology• Computers• Computing• Earth Sciences• Ecology• Education• Engineering• Humanities• Geography• Medical Sciences• Psychology• Publishing and Book Trade• Social Sciences

Page 23: A new approach to organization of research data

Subjects-01Publisher (no. of journals) Data journalBMC (3) BMC Medical Education, BMC Research Notes, BMC Psychiatry

Brill (1) Research Data Journal of the Humanities and Social Sciences

Copernicus (2) Earth Systems Science Data, Geoscientific Model Development

Earthquake Engineering Research Institute (1)

Earthquake Spectra

Ecological Society of America (1)

Ecological Archives

Elsevier (2) Data in Brief, Genomics Data

Faculty of 1000 (1) F1000Research

Hindawi (1) Dataset Papers in Science

Page 24: A new approach to organization of research data

Subjects-02Publisher (no. of journals) Data journalNature (1) Scientific Data

Pensoft (1) Biodiversity Data Journal

Procon (1) Biomedical Data Journal

Sage (1) International Journal of Robotics Research

Springer (1) GigaScience

Ubiquity (5) Journal of Open Archaeology Data, Journal of Open Humanities Data, Journal of Open Psychology Data, Journal of Open Research Software, Open Health Data, Open Journal of Bioresources

University of York in UK (1) Internet Archaeology

Wiley (2) British Journal of Educational Technology, Geoscience Data Journal

Page 25: A new approach to organization of research data

Results

• A framework for and embedded characteristics and structures in data papers• A crosswalk between the proposed common framework of this study

and a concept map (CP) for data papers (Candela et al., 2015)

Page 26: A new approach to organization of research data

A framework-01

• Title page• Description of datasets• Relationships

Page 27: A new approach to organization of research data

A framework-02

Title Page

Description of Dataset

Relationships

Title, Authors, Affiliation, Email address, Abstract, Keywords, Citations, and Dates (received, revised, accepted, and published).

Collection, Description, Coverage, Identifier, Competing interest, Ethics approval, Consent for publication, Funding statement, Copyright, Reuse, Availability, Author’s contribution, Authors’ information, References, and

Acknowledgements

Data papers, derived journal papers and deposited repository

Category Subcategory

Page 28: A new approach to organization of research data

A framework-Title page• 1 journal does not offer authors’ affiliations.• 4 journals do not offer keywords.• 1 journal assigns specific identifiers using the journal data platform, rather than a DOI or URL.• 2 journals do not have citation data for users.• 4 journals have not indicated the date of data papers. Most data papers offered four kinds of

dates (received, revised, accepted and published online) to illustrate the publishing process.• 1 all rights reserved, 25 Creative Commons, and CC-BY is the most popular• HTML (24/26) is the most popular format provided by data journals, followed by PDF (23/26).

12 of the 26 data journals offered two formats (HTML and PDF). Following the HTML and PDF formats, XML was the third most popular format. Nine data journals offered three formats (HTML, PDF and XML) and one offered four formats (HTML, PDF, XML, and EPub). Three data journals offered HTML only, and one journal offered PDF only.

Page 29: A new approach to organization of research data

A framework-Description of datasets-01

• Collection• Focuses on how data is captured or created and reflects the significance of “research method”

(Chao 2015).• Most of the content of data papers provides information describing methodology through

which data is collected or produced to answer specific research problems in a certain context.• Other important information is included within the description of methodology such as

background, ideas of project, experimental design, factors, features, analyzing methods, and quality control.

• Description• Structured categories similar to structured metadata elements• Textual based description• A hybrid approach of the first two with a structured category name with accompanying

textual statements

Page 30: A new approach to organization of research data

A framework-Description of datasets-02

• Coverage• Data papers related to the disciplinary domains of medicine, biology, and

archaeology are inclined to provide temporal and spatial keywords.• Data papers indicate the spatial coverage by tagging with longitude and latitude.• Is also used to indicate the taxonomy in terms of biological classification of species.

• Competing interestThe majority of data papers provide this subcategory that clarifies potential factors that might affect the results of the dataset.

• Ethical approval and consent for publicationThe target subjects are related to individual privacy or human and animal rights are required to indicate whether researchers have received approval from subjects for the public release and use of data.

Page 31: A new approach to organization of research data

A framework-Description of datasets-03

• Funding statement• Indicates whether production of data was supported by a funding grant.• Embedded in Acknowledge subcategory.

• Copyrighttends towards adoption of open licensing terms and conditions such as CC, CC0 and PDDL (Public Domain Dedication and License).• Author’s contribution

offers information to clarify the contribution of each author to the data paper or dataset

Page 32: A new approach to organization of research data

A framework-Relationships

• Versions of data papers• Datasets and their derived journal articles• Datasets and the data repository

Page 33: A new approach to organization of research data

A crosswalk between proposed framework and concept map-component level

Components of CP Proposed FrameworkIdentifier Identifier of title page

Content Description of dataset

Relationships

Metadata Title page and their subcategories.

Page 34: A new approach to organization of research data

A crosswalk between proposed framework and concept map-category level-01

Category of CP Common FrameworkQuality Collection

Provenance

Project

Funding statement

Coverage Coverage

Reuse Copyright

License

Competing Interest Competing Interest

Page 35: A new approach to organization of research data

A crosswalk between proposed framework and concept map-category level-02

Category of CP Common FrameworkMicrocontribution Authors’ contribution

Availability Identifier

Format Description

Ethical approval

Consent to publication

Relationships

Page 36: A new approach to organization of research data

Discussion

• Granularity described by data papers• Single dataset• Multiple datasets• Database

• Type of data journal• Data paper (13/26)• Hybrid (10/26): regular academic journal papers and data papers.• Special issue (3/26)

• Publication model• Data paper (20/26)• Software paper (2/26)• Data and software paper (3/26)• Overlay journal (1/26)

Page 37: A new approach to organization of research data

Conclusion

• Extended CP into more concrete categories and offer new categories• Future work• More subjects (about 100 data journals)• Core and optional characteristics and structures of proposed categories• The comparison with a metadata approach for RDM

Page 38: A new approach to organization of research data

References-01• Akers, K. (12 December, 2012). Data journals: Incentivizing research data dissemination. CLIR Blog.

Accessed November 19, 2015. http://connect.clir.org/blogs/katherine-akers/2013/12/12/data-journals-incentivizing-research-data-dissemination

• Callaghan, S., Donegan, S., Pepler, S., Thorley, M., Cunningham, N., Kirsch, P., Ault, L., Bell, P., Bowie, R., Leadbetter, A., Moncoiffé, G., Harrison, K., Smith-Haddon, B., Weatherby, A., & Wright, D. (2012). Making data a first class scientific output: Data citation and publication by NERCs environmental data centres. International Journal of Digital Curation, 7(1), pp. 107-13. doi:10.2218/ijdc.v7i1.218

• Callaghan, S., Tedds, J., Lawrence, R., Murphy, F., Roberts, T., & Wilcox, W. (2014). Cross-linking between journal publications and data repositories: A selection of examples. International Journal of Digital Curation, 9(1), 164-75.

• Candela, L., Castelli, D., Manghi, P., & Tani, A. (2015). Data journals: A survey. Journal of the Association for Information Science 66(9), 1747-62.

• Chao, T.C. (2015). Mapping methods metadata for research data. International Journal of Digital Curation, 10(1), 82-94.

Page 39: A new approach to organization of research data

References-02• Chavan, V., & Penev, L. (2011). The data paper: A mechanism to incentivize data publishing in

biodiversity science.” BMC Informatics, 12(S15), S2. doi: 10.1186/1471-2105-12-S15-S2• Costello, M.J. (2009). Motivating online publication of data. BioScience, 59(5418-427. doi:

10.1525/bio.2009.59.5.9• Gray, S. (2015). Case study: Publishing a data paper. Accessed December 28, 2015.

https://data.bris.ac.uk/files/2015/05/Publishing-a-data-paper.pdf• RDC Glossary. http://www.rdc-drc.ca/glossary/• Rees, J. (2010). Recommendations for independent scholarly publication of data sets. Accessed

October 06, 2015. http://neurocommons.org/report/data-publication.pdf• Smith, V.S. (2009). Data publication: Towards a database of everything. BMC Research Notes,

2(113), pp. 1-3. doi: 10.1186/1756-0500-2-113• Univ. of Penn. Libraries. (2013). Primary, secondary and tertiary sources.

http://gethelp.library.upenn.edu/PORT/sources/primary_secondary_tertiary.html