6
Science progresses when new discoveries are made and when others can subsequently build upon these discoveries by both reproducing results and creating new work based on prior research. Reproducibility of research results implies that the data, code, algorithms, and workflows are accessible, understandable, and consistently lead to the Ushering in a New Era of Scientific Reproducibility Volume 2 Issue 3 ©2014 DataONE 1312 Basehart SE University of New Mexico Albuquerque NM 87106 same results when re-analyzed by others. Unfortunately, it appears that reproducibility may be the exception rather than the norm in many, if not most, instances. For example, nearly three-quarters (73%) of the data associated with 249 psychology data sets were not made available over a 6-month period 1 . In 2012, it was initially cont’d page 2 ››› reported in Nature that most (47 out of 53) cancer research papers were irreproducible 2 , a finding that was subsequently reproduced in PLOS ONE 3 . Such reports are profoundly disturbing and indicate serious inattention to good data stewardship and scientific practice. The community, however, is waking up and new policies are bringing attention to Box 1. FORCE11 Joint Declaration of Data Citation Principles Preamble: Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be ac- corded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse. In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data within scholarly literature, another dataset, or any other research object. These principles are the synthesis of work by a number of groups. As we move into the next phase, we welcome your participation and endorsement of these principles. Principles: The Data Citation Principles cover purpose, function and attributes of citations. These principles recognize the dual necessity of creating citation practices that are both human understandable and machine-actionable. These citation principles are not comprehensive recommendations for data stewardship. And, as practices vary across communities and technologies will evolve over time, we do not include recommendations for specific implementations, but encourage communities to develop practices and tools that embody these principles. The principles are grouped so as to facilitate understanding, rather than according to any perceived criteria of importance. 1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications 1 . 2. Credit and Attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data 2 . 3. Evidence: In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited 3 . 4. Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community 4 . 5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are neces- sary for both humans and machines to make informed use of the referenced data 5 . 6. Persistence: Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe 6 . 7. Specificity and Verifiability: Data citations should facilitate identification of, access to, and verfication of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verfiying that the specific timeslice, version and/or granular por- tion of data retrieved subsequently is the same as was originally cited 7 . 8. Interoperability and Flexibility: Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities 8 . 1 CODATA/ITSCI Task Force on Data Citation, 2013, “Out of cite, out of mind: The Current State of Practice, Policy and Technology for Data Citation” Data Science Journal 12: 1-75, http://dx.doi.org/10.2481/dsj. OSOM13-043 sec 3.2.1; Uhlir (ed.), 2012, Developing Data Attribution and Citation Practices and Standards. National Academies. http://www.nap.edu/download.php?record_id=13564, ch. 14.; Altman, Micah, and Gary King, 2007, “A proposed standard for the scholarly citation of quantitative data.” D-lib Magazine 13.3/4. 2 CODATA 2013, Sec 3.2; 7.2.3; Uhlir (ed.), 2012,ch. 14 3 CODATA 2013, Sec 3.1; 7.2.3; Uhlir (ed.), 2012, ch. 14 4 Altman-King 2007, CODATA 2013, Sec 3.2.3, Ch. 5; Ball, A., Duke, M., 2012, ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation Centre. http://www.dcc.ac.uk/resources/briefing-papers/ introduction-curation/data-citation-and-linking 5 CODATA 2013, Sec 3.2.4, 3.2.5, 3.2.8 6 Altman-King 2007; Ball & Duke 2012; CODATA 2013, Sec 3.2.2 7 Altman-King 2007; CODATA 2013, Sec 3.2.7, 3.2.8 8 CODATA 2013, Sec 3.2.10

Box 1. FORCE11 Joint Declaration of Data Citation Principles · 2015-03-18 · 4. Unique Identification: A data citation should include a persistent method for identification that

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Box 1. FORCE11 Joint Declaration of Data Citation Principles · 2015-03-18 · 4. Unique Identification: A data citation should include a persistent method for identification that

Science progresses when new discoveries are made and when others can subsequently build upon these discoveries by both reproducing results and creating new work based on prior research. Reproducibility of research results implies that the data, code, algorithms, and workflows are accessible, understandable, and consistently lead to the

Ushering in a New Era of Scientific Reproducibility

Volume 2 Issue 3

©2014 DataONE 1312 Basehart SE University of New Mexico Albuquerque NM 87106

same results when re-analyzed by others. Unfortunately, it appears that

reproducibility may be the exception rather than the norm in many, if not most, instances. For example, nearly three-quarters (73%) of the data associated with 249 psychology data sets were not made available over a 6-month period1. In 2012, it was initially

cont’d page 2 ›››

reported in Nature that most (47 out of 53) cancer research papers were irreproducible2, a finding that was subsequently reproduced in PLOS ONE3. Such reports are profoundly disturbing and indicate serious inattention to good data stewardship and scientific practice.

The community, however, is waking up and new policies are bringing attention to

Box 1. FORCE11 Joint Declaration of Data Citation PrinciplesPreamble: Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be ac-

corded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.

In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data within scholarly literature, another dataset, or any other research object. These principles are the synthesis of work by a number of groups. As we move into the next phase, we welcome your participation and endorsement of these principles.

Principles: The Data Citation Principles cover purpose, function and attributes of citations. These principles recognize the dual necessity of creating citation practices that are both human understandable and machine-actionable.

These citation principles are not comprehensive recommendations for data stewardship. And, as practices vary across communities and technologies will evolve over time, we do not include recommendations for specific implementations, but encourage communities to develop practices and tools that embody these principles.The principles are grouped so as to facilitate understanding, rather than according to any perceived criteria of importance.

1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record

as citations of other research objects, such as publications1.2. Credit and Attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing

that a single style or mechanism of attribution may not be applicable to all data2.

3. Evidence: In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited3.4. Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a

community4.5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are neces-

sary for both humans and machines to make informed use of the referenced data5.

6. Persistence: Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe6.7. Specificity and Verifiability: Data citations should facilitate identification of, access to, and verfication of the specific data that support a claim. Citations or

citation metadata should include information about provenance and fixity sufficient to facilitate verfiying that the specific timeslice, version and/or granular por-

tion of data retrieved subsequently is the same as was originally cited7.8. Interoperability and Flexibility: Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should

not differ so much that they compromise interoperability of data citation practices across communities8.

1CODATA/ITSCI Task Force on Data Citation, 2013, “Out of cite, out of mind: The Current State of Practice, Policy and Technology for Data Citation” Data Science Journal 12: 1-75, http://dx.doi.org/10.2481/dsj. OSOM13-043 sec 3.2.1; Uhlir (ed.), 2012, Developing Data Attribution and Citation Practices and Standards. National Academies. http://www.nap.edu/download.php?record_id=13564, ch. 14.; Altman, Micah, and Gary King, 2007, “A proposed standard for the scholarly citation of quantitative data.” D-lib Magazine 13.3/4.

2CODATA 2013, Sec 3.2; 7.2.3; Uhlir (ed.), 2012,ch. 143CODATA 2013, Sec 3.1; 7.2.3; Uhlir (ed.), 2012, ch. 144Altman-King 2007, CODATA 2013, Sec 3.2.3, Ch. 5; Ball, A., Duke, M., 2012, ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation Centre. http://www.dcc.ac.uk/resources/briefing-papers/

introduction-curation/data-citation-and-linking5CODATA 2013, Sec 3.2.4, 3.2.5, 3.2.86Altman-King 2007; Ball & Duke 2012; CODATA 2013, Sec 3.2.27Altman-King 2007; CODATA 2013, Sec 3.2.7, 3.2.88CODATA 2013, Sec 3.2.10

Page 2: Box 1. FORCE11 Joint Declaration of Data Citation Principles · 2015-03-18 · 4. Unique Identification: A data citation should include a persistent method for identification that

� Spring 2014

2

the matter. In a recent editorial in Science, Editor-in-Chief Marcia McNutt announced several author guidelines that were designed to increase confidence in the studies published in Science4. Likewise, all authors who submit to a PLOS journal must now provide a Data Availability Statement that describes where and how the data that underlie reported findings can be accessed5.

The efforts of publishers are laudatory as they foreshadow broader adoption of proactive policies by research sponsors, scientific societies, and others. DataONE, for example, was among the first organizations to endorse the recently released FORCE11 guiding principles for citing data within the scholarly literature, another dataset, or other research objects6. The FORCE11 principles are exemplary in that they are comprehensive but succinct, covering attribution, unique identifiers, persistence, verifiability, and interoperability (see Box 1).

Adoption of policies and procedures such as those being promulgated by FORCE11, PLOS, and Nature will ultimately benefit science in a multitude of ways, but especially by enabling new discovery and innovation and building and restoring trust in science. DataONE’s principal objective over the next five years will be to continue building out the cyberinfrastructure that can best enable data discovery and access, and that can better support openness and reproducibility of research results. n

— Bill MichenerProject Director, DataONE

1Wicherts, J.M., D. Borsboom, J. Kats, and D. Molenaar. 2006. The poor availability of psychological research data for reanalysis. American Psychologist 61 (7): 726–728. doi:10.1037/0003-066X.61.7.726.

2Begley, C.G. and L.M. Ellis. 2012. Drug development: Raise standards for preclinical cancer research. Nature 483 (7391): 531–533. doi:10.1038/483531a.

3Mobley, A., S.K. Linder, R. Braeuer, L.M. Ellis, and L. Zwelling. 2013. A survey on data reproducibility in cancer research provides insights into our limited ability to translate findings from the laboratory to the clinic. In Arakawa, Hirofumi. PLoS ONE 8 (5): e63221. doi:10.1371/journal.pone.0063221.

4McNutt, M. 2014. Reproducibility. Science 343 (6168): 229. doi:10.1126/science.1250475

5http://blogs.plos.org/everyone/2014/02/24/plos-new-data-policy-public-access-data/

6https://www.force11.org/datacitation

CoverSTORY cont’dEach Member Node within the DataONE federation completes a description document summarizing the content, technical characteristics and policies of their resources. These documents can be found on the DataONE.org site at bit.ly/D1CMNs. In each newsletter issue we will highlight one of our current Member Nodes.

The USA National Phenology Network (USA-NPN) http://www.usanpn.org/home

The USA National Phenology Network (USA-NPN) is one of several Member Nodes who have joined DataONE this winter. The USA-NPN serves science and society by promoting broad understanding of plant and animal phenology and its relationship with environmental change. The Network is a consortium of individuals and organizations that collect, share, and use phenology data, models, and related information.

Through the DataONE infrastructure, the USA-NPN delivers phenology data from its Nature’s Notebook program collected at 2,000 sites across the United States from 2009 through 2013. Over 2.5 million records are currently discoverable via DataONE. Nature’s Notebook is an off-the-shelf program appropriate for scientists and non-scientists alike, engaging observers across the nation to collect phenology observations on both plants and animals. Like fellow DataONE Member Node eBird, as an example of public participation in scientific research (or citizen science), USA-NPN’s partnership with DataONE makes data collected by volunteers and individual researchers available to a much wider audience.

At the USA-NPN website, a user may:• Explore the data collected via Nature’s Notebook with an interactive

visualization tool.• Download the data collected via Nature’s Notebook.• Search for other phenology data sets.• Share existing data.

One might also want to explore some preliminary findings and peer-reviewedresults from the USA-NPN data.

Observational data such as when leaf buds broke on a red maple, or how many chinook salmon were seen migrating can be combined with other relevant datasets to document environmental changes. Phenology affects nearly all aspects of the environment, including the abundance, distribution, and diversity of organisms, ecosystem services, food webs, and the global cycles of water and carbon.

MemberNodeDESCRIPTION

Page 3: Box 1. FORCE11 Joint Declaration of Data Citation Principles · 2015-03-18 · 4. Unique Identification: A data citation should include a persistent method for identification that

� Spring 2014

3

CyberSPOTCyberInfrastructure Update

The DataONE infrastructure continues production operations with an increasing number of Member Nodes and volume of data available through the DataONE service interfaces. New Member Nodes that have come online since the last newsletter include the USA National Phenology Network (https://www.usanpn.org/), the Dryad Digital Repository (http://datadryad.org/), The University of Kansas Biodiversity Institute (http://biodiversity.ku.edu/), and the Gulf of Alaska Data Portal (http://data.aoos.org/maps/search/gulf-of-alaska.php).

Counting DataAs of this writing, there are 462,676

“objects” accessible through DataONE. An object is the most granular unit of content in DataONE that is assigned a persistent, globally unique identifier and is tracked by the DataONE Coordinating Nodes. While individual objects are useful entities in themselves, a combination three types of object will typically form a data package, that is, a minimal unit of scientifically meaningful information. These three types

of objects are labeled as data, metadata, and resource maps. Metadata objects describe one or more data objects, and resource map objects document the relationship between particular data and metadata objects. Thus a fully defined data package is composed of data, metadata, and at least one resource map. From a UML class perspective of objects in DataONE, data, metadata, and resource maps would be considered classes that specialize the object class, and a data package would be an aggregation of those. From an entity-relation perspective, a data package is composed of one or more data or metadata objects, one or more metadata objects describe a data object, and a resource map document defines which data are described by which metadata, and vice versa. To further complicate the situation, some types of metadata may include data within the same object (e.g. EML with embedded CSV), and thus of the 462,676 objects, 147,625 of these are data objects, 203,348 are metadata objects, and 112,703 are resource maps. Determining the number of datasets is not a trivial operation because a dataset may be composed of an arbitrary number of each

type of object. Examining each resource map and determining the total number of discrete entities that emerge when all associations are considered might obtain an accurate accounting. However, a single large data package might be composed of many smaller entities that might also be considered useful data packages in themselves. Thus, a count of discrete entities would likely underestimate the count. Alternative approximations can be found by simply considering the number of resource maps (since a data package will have at least one resource map), the number of data objects (since a minimally useful data package will have at least one data object), or the sum of data and metadata objects (since data or metadata alone might be considered minimal scientifically meaningful entities).

It is also important to consider that any type of object may be obsoleted by a newer version, and from one perspective, the counts should only consider the number of objects that have not been obsoleted. However, obsoleting an object in DataONE does not mean that the object is no longer scientifically meaningful. In fact, repeat analyses require that content be immutable, and so only the exact same version of an object would be considered scientifically meaningful for reliable repeat analyses. New analyses will likely take a different approach, and only consider the latest versions of content as being relevant.

These nuances in defining what is a meaningful data package complicate answering an apparently simple question such as “How much data can be accessed through DataONE?” To simplify, we report on the number of discrete objects being tracked by DataONE, and how they are categorized into the three basic types of data, metadata, and resource maps. This breakdown is shown in Figure 1, which tracks the different object types over time since the start of DataONE production operations. n

Figure 1: Counts of data/metadata/resource maps uploaded to DataONE since release in July 2012

Page 4: Box 1. FORCE11 Joint Declaration of Data Citation Principles · 2015-03-18 · 4. Unique Identification: A data citation should include a persistent method for identification that

� Spring 2014

4

DataONE and many cyberinfrastructure projects are turning their attention to managing Big Data, a challenge that has arisen in the past few years from improvements in sensors to collect observations at high rates, high performance computers that generate large volume outputs, and large volume storage capacity that can hold Big Data. Much of the Big Data cyberinfrastructure efforts are on developing systems that will facilitate discovery and access.

The Exploration, Visualization, and Analysis (EVA) Working Group in DataONE has been looking into end-user aspects of Big Data—what tools and services can aid the efforts of researchers to explore these vast stores of data? During the past year, EVA examined the types of analyses that researchers would do when the Big Data first become available. In the past, for example, when researchers began their analyses they might run 100s of correlation analyses and print 100s of plots.

For this EVA activity, we brought in visualization experts from New York University and terrestrial biosphere modelers from a large model intercomparison project funded by NASA. We wanted to apply well-established visualization theories that would benefit the exploration done by the modeling group.

The modeling project1 is examining photosynthesis and respiration to better understand the role of terrestrial biosphere in the global carbon cycle. This is an important part of the global carbon cycle, because our understanding of the amount of carbon pulled out of the atmosphere by the terrestrial biosphere and how that uptake will change in the future is poorly known.

To test methods, EVA used output from 7 models for 1980-2010, a portion of the total 1900 – 2010 model output generated by the intercomparison project.

EVA decided to develop a tool – SimilarityExplorer (Figure 1) – that will

quickly explore similarity at several levels of detail among different models and quantitatively and visually compare spatial and temporal patterns of output data. For example, we wanted to explore in which region models agree with each other the most and the least.

A Multidimensional Scaling (MDS) method is used to quickly explore overall similarity among different model outputs. Because all models are being intercompared in one plot, this can be thought of as a many-to-many comparison. In the MDS displayed in Fig. 1c, the shorter the distance between a pair of model outputs, the more similar the model outputs are. A model far from any other dot represents an outlier model. In the example shown in Fig 1c., output from the models DLEM and CLM4VIC are similar to each other and the model VISIT is different from each of the other model outputs.

Using the metaviews in SimilarityExplorer, scientists simultaneously examine similarity in space and time (Fig. 1b, lower-left triangle for space and top-right triangle for time), in multiple variables (Gross Primary Productivity (GPP), Net Primary Productivity (NPP), and Net Ecosystem Exchange (NEE)), and across multiple models (each row/column

of the matrix). This type of comparison is a one-to-one output comparison, but each comparison is displayed in the one matrix view. Our experience was that scientists have a preference for either spatial or temporal analysis, and SimilarityExplorer lowers the barrier for these scientists to view the other

side of the “fence” and view both spatial and temporal analyses simultaneously. By visualizing model-model spatial and temporal Fig 1.b, we observe that the models VISIT and BIOME poorly agree with each other in July, and that BIOME agrees best with the models DLEM, LPJ, and CLM4VIC.

Another functionality in SimilarityExplorer allows examining similarity and differences by drilling down from a global scale to a regional view. In Fig. 2, we can explore similarities and differences between DLEM and BIOME and between DLEM and CLM within different regions. The highest temporal correlations and best agreements are between BIOME and DLEM in the boreal regions or North America and Europe. For spatial correlations, DLEM and CLM show good agreement across all the regions (Fig. 2), while BIOME and DLEM show poorer agreement in temperate Europe.

These are some preliminary analyses

WorkingGroupFOCUS EVA Working Group: Tools for Exploratory Analysis of Outputs from Terrestrial Biosphere Models

cont’d page 5 ›››

Figure 1: SimilarityExplorer is composed of a set of filters (a) for models, parameters, time periods and regions, and meta views (b, c, d). The meta views are b) a matrix view for showing pairwise similarity, c) a projection view for showing multi-way similarity, and d) a small multiples view for showing region-wise spatio-temporal similarity.

1The MsTMIP - Multi-scale Synthesis and Terrestrial Model Inter-comparison Project, lead by Anna Michalak, Stanford University, and Debbie Huntzinger from Northern Arizona University, two members of DataONE’s EVA Working Group.

Page 5: Box 1. FORCE11 Joint Declaration of Data Citation Principles · 2015-03-18 · 4. Unique Identification: A data citation should include a persistent method for identification that

� Spring 2014

5

from the use of SimilarityExplorer. Modelers who have seen the tool are anxious to try it out themselves. Christopher Schwalm from Northern Arizona University thought the tool provided a new view into the Big Data of model output: “after being able to use the region/time filters to address the when and the where, one can address the why.”

EVA’s next steps are to add more drill down functionality to look within regions at how well models agree. An important functionality EVA is examining is adding a one-to-many comparison, to the one-to-one and many-to-many comparisons of the current tool. The one in this case could be a reference model or even observations. Use of observations would allow the models to be benchmarked against measurements. How well the models do against the benchmark would provide a sense of confidence in the models. EVA will also add model structural characteristics information into SimilarityExplorer, which will enable the ability to trace model similarity/

WorkingGroupFOCUS cont’d

difference back to model structures, to provide more about “why” models are similar or different. n

Figure 2: The SimilarityExplorer allows users to drill down from an overall view (Europe, Eurasia Temperate, Eurasia Boreal, NA Temperate, NA Boreal) in the top frame to look at similarity in model representation in space and time for those regions individually. In this case we see that BIOME and DLEM models are most similar (most highly correlated in June) for the Boreal areas (Eurasia and NA Boreal)temporal similarity.

— Bob Cook— Yaxing Wei

Oak Ridge National Laboratory, Oak Ridge, TN— Jorge Poco

— Aritra Dasgupta— Enrico Bertini

New York University, Brooklyn, NY

Members of the DataONE Team will be at the following events. Full information on training activities can be found at bit.ly/D1Training and our calendar is available at bit.ly/D1Events.

May 27-30 Qualitative and Quantitative Methods in Libraries Istanbul, Turkey http://www.isast.org/qqml2014.html

Jun. 9-13 Open Repositories Helsinki, Finland http://or2014.helsinki.fi/

Jul. 6-7 DataONE Users Group Frisco, CO http://www.dataone.org/dataone-users-group

Jul. 8-11 Federation of Earth Science Information Partners Frisco, CO http://commons.esipfed.org/2014SummerMeeting

UpcomingEVENTS

OutreachUPDATE We are looking forward to a busy season

of education, outreach and community building activities. As mentioned in the previous newsletter, and highlighted on our website and via twitter, we have recently been engaged in the process of recruiting interns for our 2014 Summer Internship Program. Notifications have been sent out and once all the paperwork is in place, we will be updating our website to profile our new summer team. In the meantime, I can tell you that we have some exciting projects planned. Many of these concern cyberinfrastructure developments; semantics, provenance, metadata standards and the SimilarityExplorer highlighted in this issue’s Working Group Focus (page 4). However on the Education and Engagement side of things we aim to create a suite of online tutorials in support of our software and services, in addition to bringing some of our Data Stories (http://bit.ly/D1Stories) to life in video format. You can expect some of this work to be showcased at the DUG meeting in July (see “TheDUGOut”) and we will be soliciting feedback during screencast development. These projects will kick-start increased use

of media and online engagement to reach the broader DataONE community.

We are also pleased to announce the launch of a new section of our public website ‘For Librarians’. As custodians of the scholarly record, and with well-established relationships to researchers within their institutions, we recognize the contribution of librarians

as providers of expertise in research data management services. To support librarians in this role, we have developed a Librarian Outreach Kit by way of introduction to some of the most relevant DataONE products, services and resources of value to institutional librarians. Discover more at: http://bit.ly/D1librarians. n

Page 6: Box 1. FORCE11 Joint Declaration of Data Citation Principles · 2015-03-18 · 4. Unique Identification: A data citation should include a persistent method for identification that

� Spring 2014

6

1312 Basehart SEUniversity of New MexicoAlbuquerque, NM 87106

Fax: 505.246.6007

DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under

a Cooperative Agreement.

Project Director:

William [email protected]

505.814.7601

Executive Director:

Rebecca [email protected]

505.382.0890

Director of Community Engagement and Outreach:

Amber [email protected]

505.205.7675

Director of Development and Operations

Dave [email protected]

FeaturedITEM

Dryad Celebrates its 5th AnniversaryThe Dryad Digital Repository, a collaborator and partner of DataONE since 2009, is celebrating its 5th anniversary this April.

Dryad (http://datadryad.org/) is a curated resource that makes the data underlying scientific and medical publications discoverable, freely reusable, and citable.

Dryad originated out of the need for an easy-to-use, sustainable, community-governed, general-purpose data repository to support the joint data archiving policy that has been adopted by many of the leading journals and scientific societies in evolutionary biology and ecology. The first of the nearly fifty journals to integrate data and manuscript submission with Dryad was The American Naturalist, in April 2009.

Today, Dryad provides a general-purpose home for many different data types, adding to the breadth and diversity of Earth and environmental science data exposed through DataONE. There are approximately 15,000 data files from 5,000 individual publications, associated with 300 different journal titles, books and theses. All content is persistently linked to the associated publication. By preserving and making available the data underlying the scientific literature, Dryad provides benefits to individual researchers, educators, students, and research supporting organizations.

Dryad has been an active member of the DataONE community since inception, participating in DataONE’s cyberinfrastructure design and implementation, community engagement activities, and research into sociocultural aspects of data sharing. Dryad is one of the first DataONE Member Nodes to use the popular DSpace repository platform and paves the way for future DSpace Member Nodes.

As a data repository supporting a wide variety of scientific journals, Dryad encourages researchers’ growth and development as data managers, promoting use of the DMPTool

(a service of California Digital Library). Many funders require a data management plan be submitted with proposals, and part of that plan is to identify and establish long term preservation of research data. Dryad provides that preservation utility itself as well as the potential for replication at other DataONE Member Nodes.

TheDUGout

Dear DUG Members:We are pleased to share with you the

upcoming plans for the Summer 2014 DUG Meeting. This meeting, as usual, will be held adjacent to the Summer ESIP Meeting. This year’s location is Frisco, Colorado, at the Copper Mountain Resort, and the dates will be July 6-7, 2014. Mark your calendar now! We are hopeful that this wonderful venue and location near many major scientific research organizations (ie. NEON, NSIDC, NOAA, NCAR, etc.) will yield a great crowd and series of conversations about DataONE tools and services.

The DUG Steering Committee has met several times over recent months to initiate planning activities for this meeting. We have determined that the format will once again

include the popular poster sessions and roundtable discussions which were introduced in Summer 2013, along with the more typical updates and forums from various arms of DataONE. By popular demand, we also plan to offer a half-day workshop on the DMPTool (http://dmptool.org), which will be highly relevant following a May release of version 2.0.

To help us further refine the meeting agenda, please take a few moments to respond to a few brief questions in the following survey: https://www.surveymonkey.com/s/DUG2014.

We ask that you complete it by May 15th 2014. We look forward to hearing your thoughts

in the survey and hope that we’ll see you in Colorado in July! n

— Andrew SallansChair, DataONE Users GroupUniversity of Virginia Library

— Chris EakerVice-Chair, DataONE Users Group

University of Tennessee Library