Download docx - sharing draft.d… · Web viewData sharing practices within the ... These features include word count ... Processing requests for information and providing explanations of their

Abstract

Data sharing practices within the Earth Sciences vary between disciplines. Each discipline,

indeed each institution, has its own policies and practices to facilitate data sharing. Two major

impediments to data sharing that every discipline must contend with are ensuring data quality

and sharing behaviors among researchers and institutions. This paper will examine data sharing

practices among the Earth Sciences by exploring the data life cycle, in terms of ensuring data

quality, as well as examine motivations and incentives behind data sharing behaviors.

Introduction

Data sharing among the Earth Sciences is invaluable. Sharing research effectively with other

scientists, as well as with the laity, could be argued to be as important as the research itself. If no

one has access to the research or can properly interpret the meaning of findings, then the data

essentially has no meaning. Therefore, data sharing policies and practices are becoming more

common among the scientific community in order to encourage and facilitate the process of data

sharing. This paper will explore the impediments and behaviors of data sharing practices in the

Earth Sciences discipline, and will discuss more specifically programs and institutions designed

to facilitate data sharing in the atmospheric sciences, oceanographic sciences, and astronomy.

Importance of Data Sharing

Before exploring barriers and behaviors of data sharing, it is important to briefly emphasize the

importance of sharing data. Providing accessible, high-quality data encourages “open scientific

inquiry” allowing research to be “validated or refuted” by the scientific community [1]. Data

sharing encourages debate among scientists and prompts further scrutiny of research conclusions

[2]. This further scrutiny then fosters a new level of integrity of the data. Sharing data also

allows other researchers to draw new conclusions from the work and can provide users with a

basis “for new research and new methods of data analysis” [1]. Forming new collaborations

between researchers is often a result of drawing new conclusions between another’s data and

your own [2]. Maintaining large repositories of data offers researchers access to a wide wealth

of information larger than could be generated by an individual or even one institution [1].

Sharing information also prevents the “unnecessary duplication of effort” and thus promotes

greater and faster strides in scientific discovery [1]. This wasted effort is measured not only in

cost, but also in wasted time. Lastly, providing open access to data encourages learning and

discovery among the public and involves the public in the scientific process. By sharing data

with the public, the scientific community has encouraged a movement to citizen science which is

quickly becoming a popular method of sharing data between scientific disciplines. Encouraging

the public to become involved in the data sharing process will become increasingly important as

scientists and the public alike attempt to wade through the age of the data deluge.

Impediments to Data Sharing

There are many impediments to effective data sharing. One of the most basic issues is data

quality. There is no way to discuss data sharing without discussing data quality first. If a

researcher cannot trust the data they are seeking, then sharing the data is useless. The level of

data quality can be measured in many different ways. Following the life cycle of data from

production to management to use/re-use is one way to maintain data quality [3].

In the production phase, two key components are the calibration of the instruments used to

collect the data and the methodology chosen to collect the information [3]. Instruments must be

continually checked and recalibrated in order to maintain a high level of data quality.

Researchers must also choose a most appropriate method of collecting the data, from gathering

general human observations to using a wireless sensing system. The Center for Embedded

Network Sensing realizes the value of collecting trustworthy data and facilitating the reuse of

that data, especially if that data is impossible to reproduce [4]. For example, CENS uses

dynamic sensors which can adjust monitoring conditions in real time [4]. CENS also deploys

scientists into the field with the sensors allowing the scientists to fine tune the instruments;

however, this also brings up other issues with data integrity such as differences in the setup of

the equipment between different teams of scientists. In the past, CENS relied heavily on oral

exchange in terms of equipment usage, calibration, and methodology, but more recently has

discovered the need for consistent documentation to ensure data quality [4], which is the last

component of the production phase. According to CENS scientists, one of their most important

needs is confidence in their measurements. This confidence rests on equipment selection,

equipment calibration, and human reliability [4]. The proper documentation of these

components, whether it be in paper or digital form, is essential to enhancing trust [4]. CENS

applications in the Earth Sciences include their seismic research area. This area implements

network technology to monitor aftershock and volcanic zones [5]. They use a wireless network

with a signal to noise ratio in order to accurately monitor seismic events [5]. However, though

the concept of their Wireless Linked Seismic Network has worked well, there are still problems.

The concept has worked well because the network is actually in Mexico but has been managed

almost completely from the United States. But many problems including hardware failures,

software bugs, weather related failures, and poor design have led to “significant loss of data” [5].

Though most problems were recognized by logging into the network and “probing the sites,” a

field engineer had to be deployed to Mexico to oversee the problems. This illustrates the need

for constant monitoring of instruments and networks in order to safeguard data quality.

The second phase of the data life cycle is data management. This refers to long-term

accessibility of the data [3]. Data is often stored in data archives or repositories specific to

particular disciplines. There are many different archives specific to the area of Earth Sciences

including the National Oceanic and Atmospheric Administration, the British Oceanographic

Data Centre, the National Aeronautics and Space Administration, and the Australian Antarctic

Data Centre. NOAA’s National Climactic Data Center archives its data in the Hierarchical Data

Storage System (HDSS). This system is “the robotic tape assembly used to store large datasets

at NCDC” [6]. The data is then transferred from the tapes onto the public ftp site [6]. The

British Oceanographic Data Centre uses the relational model of database design in order to store

information [7]. This allows tables to have “relationships or links” to other tables [7]. They also

use the National Oceanographic Database to store metadata associated with the datasets. NASA

uses the Distributed Active Archive Centers or DAAC. These centers each serve a particular

discipline in the Earth Sciences to “process, archive, document, and distribute data” from

different satellites and programs [8]. The Australian Antarctic Data Centre (AADC) uses several

databases and a SCAR Feature Catalogue for spatial data [9]. The AADC also has a system

specifically for cataloguing metadata associated with a dataset called the Catalogue of Australian

Antarctic and Sub-Arctic Metadata [9]. Simply looking at these few institutions shows that there

is no one perfect way of storing and sharing data. The system of storing information in a

database for easy retrieval is obviously a commonality, but the specific implementation of

practices and uses varies across institutions.

Another component to data management is “retrievability” [3]. Retrievability refers to the

metadata that accompanies the data as well as the data format. Research data is available in

many different formats including XML, spreadsheet files, database schemas, HTML, Word

documents, PDF format, and many more [10]. The Australian Antarctic Data Centre offers

researchers the option of collecting data in many different formats including TXT, HTML, XML,

MS Excel, CSV, MS Access, JPEG, MPEG, and MP3 just to name a few [9]. The British

Oceanographic Data Centre requires the use of standard formats such as the BODC request

(ASCII) format, Ocean Data View format, a netCDF format, and an AXF format [7]. These

standard formats are described in explicit detail on their website and their researchers required to

format their data by these specific standards. The Global Observing Systems Information Center

(GOSIC) portal, discussed later in greater detail, facilitates data sharing because it returns data

regardless of the format [11]. Again, there is no one specific format for easy data sharing and

each institution maintains their data in a way that works best for them and their researchers. The

UK Data Archive does make recommendations of data formats to use for long-term preservation

of research data [2]. This archive makes recommendations for quantitative tabular data with

extensive metadata, quantitative tabular data with minimal metadata, geospatial data, qualitative

data, digital image data, digital audio data, digital video data, and documentation and scripts. It

is important to understand that the main goal of collecting data in specific formats is to ensure

long-term preservation, accessibility, and usability.

The second issue surrounding retrievability is providing sufficient metadata to support the

understanding and management of data. The three different types of metadata include

descriptive metadata, administrative metadata, and structural metadata [12]. Descriptive

metadata is information regarding the content of the dataset [12]. This type of metadata helps

users to properly interpret the datasets and extrapolate from the datasets or data collections.

Administrative metadata is information which is needed to allow proper management of the data

[12]. This type of metadata is used by those responsible for maintaining the datasets. Structural

metadata describes how different components of associated datasets relate to each other [12].

All these types of metadata are crucial to maintain management of datasets and thus ensure

quality control. GOSIC’s three main systems, the Global Climate Observing System, the Global

Ocean Observing System, and the Global Terrestrial Observing System are required to have

“directory level” and “archive level” metadata associated with their datasets [13]. Directory

level metadata refers to “general descriptive information” needed by a user to identify the dataset

[13]. This includes information about the location of the dataset and contact information.

Archive level metadata refers to the information needed to understand the dataset [13]. At the

Australian Antarctic Data Centre a data record is not complete until all associated metadata has

been submitted [9]. Metadata is essential in enabling data sharing and allowing users to

effectively use data.

Another area to consider when discussing accessibility of research data is data policy issues.

According to a questionnaire posed to a sample of Dutch professors and senior lecturers, open

access to datasets [perhaps after an embargo period] in the field of physical sciences, which also

encompasses Earth Science, is popular [3]. Nonetheless, popularity of open access remains

varied across specific Earth Sciences disciplines. In the field of atmospheric sciences, the

NOAA/National Climactic Data Center Open Access to Physical Climate Data Policy is

essentially full and open data access [14]. According to the policy, all raw data collected from

their many climate observing systems and output from their climate models are all “openly

available in as timely a manner as possible” [14]. Additionally, NOAA makes its derived

datasets available to the public as well as access to climate-related model simulations [14].

NOAA’s National Climate Data Center also operates the Global Observing Systems Information

Center (GOSIC). GOSIC allows people to access international climate related datasets from the

Global Climate Observing System, the Global Ocean Observing System, and the Global

Terrestrial Observing System [11]. The goal of GOSIC is to provide full and open exchange of

data, data products, and metadata for all of these systems at the lowest cost to the user. The

easiest way for users to access information is through the GOSIC portal. This portal does not

contain the datasets, but rather serves as a “single entry point for users” [11]. The portal

maintains information about the datasets and provides users easy access to the data without the

user having to navigate through hundreds of confusing websites trying to find the information

they seek [11].

At NASA, their data sharing policy promotes the “full and open sharing of all data” with those in

academia, the private industry, and the public community [15]. One goal of their policy is to

create a National Information Infrastructure to foster an Environmental Information Economy to

promote a “routine exchange of environmental data” [15]. However, NASA does have the right

to protect data first produced by NASA or by Recipient that contains trade secrets, commercial

or financial information, and other confidential information for a period of two years [15

BODC also promotes the use of their data for the advancement of industry, education, science,

and public knowledge [7]. BODC follows the National Environment Research Council (NERC)

Data Policy in which environmental data will be made available to any person or organization

who wants the data [16]. There are a few restrictions on open access, but those are specifically

explained by the Environmental Information Regulations [16]. Also, in order to protect ongoing

research projects, NERC allows researchers exclusive rights to data they have collected for a

maximum of two years from the end of the data collection period [16]. NERC also requires the

development of a formal data management plan, much like the requirements of the National

Science Foundation [16].

AADC releases submitted data to the public after embargo periods specific to the kind of data

being submitted [9]. For example, ship-sourced observations and measurements are released by

a project’s end data while data on threatened species has an unlimited embargo period [9]. All of

these different institutions illustrate the differences in data sharing policies among the Earth

Sciences disciplines. Even though these institutions all fall under Earth Sciences, they each

maintain specific data policy practices. This is why it is impossible to discuss overall data

sharing practices in the Earth Sciences without discussing different disciplines.

A final thought to accessibility of data, under the management phase of the data life cycle, is

copyright considerations. One proposed solution to copyright issues is the Creative Commons

licenses [3]. Creative Commons promotes “universal access” to research in order to achieve “an

Internet full of open content” [17]. Their mission is to give individuals, institutions, and

companies the ability to keep their copyright but also allow others “certain uses” to their work

[17]. Basically this allows for a “some rights reserved” approach to information sharing rather

than an “all rights reserved” mentality. AADC is one Earth Science institution which uses the

Creative Commons license. Under the Creative Commons Attribution 3.0 License, users are able

to share or to remix the work. The only condition to using the data is that users must attribute

content to the AADC, or, more specifically, to the original creator [17].

The last phase of the data life cycle is use or re-use. In terms of data quality, the use or re-use of

data has to do with the peer review process [3]. This process involves not only the review of

paper publications, but also the quality assurance of datasets. The peer review process is

accomplished in many different ways. One way is to guarantee scientific quality assurance and

formal quality assurance. In scientific quality assurance, the reviewer needs to have significant

knowledge of the topic in order to properly review the information [10]. Since papers are limited

in length, scientific quality assurance on paper publications by a human is feasible [10].

Conversely, since datasets can be huge in scope, reviews on datasets sometimes rely on the help

of computers [10]. The review of the dataset requires the “quality assurance of the data and the

metadata” [10]. On the other hand, formal quality assurance involves reviewing technical

features rather than content. These features include word count, typesetting, and structure [10].

In this aspect of the review process, reviewers do not need significant knowledge of the topic and

so this process is much quicker than scientific quality assurance. An example of this method of

scientific and formal quality assurance is the research project Publication of Environmental Data

[10]. This project involves the use of a software program that examines meteorological data

[10]. This software looks for “outliers and other deviations” based on the parameters set by

researchers and then produces an XML report on which researchers can make annotations

describing the deviations in the data [10]. There is even separate software for monitoring

metadata. This project hopes to ensure data reliability, and thus increased data sharing, by

implementing these review practices.

The importance of the peer review process in data sharing is becoming more important as more

researchers are publishing datasets along with their articles [3]. An evolving practice consists of

the use of special journals for “data publications,” which involve “separate articles describing the

data collection” [3]. An example of this practice is the journal Earth System Science Data. For

this journal, the guidelines of peer review include read the manuscript, check the data quality,

consider the article and dataset, check the presentation quality, and check the publication. The

reviewer then rates the data based on significance, with the sub-criteria of uniqueness,

usefulness, and completeness of the dataset, data quality, and data presentation. The most

important question comes at the end of the process, where the reviewer answers the question “By

reading the article and downloading the dataset would you be able to understand and [re-]use the

dataset in the future?” [18]. This question encapsulates the importance of data quality in

understanding and re-using data and consequently facilitating data sharing.

Another emerging technique of peer review is subsequent peer review. Subsequent peer review,

rather than advance peer review, is a more logical option for peer review of datasets because

reviewing quality of datasets, which can be huge in size, can lead to a significant loss of time

between data gathering and data publication. In the subsequent method of peer review, those

who re-use the data are supposed to write a review about the dataset which is then linked to the

dataset [3]. In the physical sciences discipline, comments on quality by re-users were generally

favored, whereas peer review of the dataset as part of peer review of the publication was seen as

not needed due to concerns about feasibility [3]. Since datasets can become so large, most feel

that requiring peer review of datasets puts an “excessive burden on peer reviewers” which is

viewed as unnecessary [3]. Though peer review of datasets is not widely popular, the overall

importance of peer review in assuring data quality remains a key factor in the data life cycle.

The behavior of researchers and scientists is another impediment to effective data sharing in the

Earth Sciences. A willingness and openness to sharing data is the first step in the process of

effective data sharing. Even though every piece of data and metadata is documented, formatted

correctly, stored, and reviewed, if a researcher is not open to sharing data then having a perfect

data sharing system is useless. Some say that scientists are generally open to the idea of sharing

data and collaborating with their peers, but, when it comes down to actual implementation,

scientists sometimes fall short.

To begin the discussion of sharing behaviors, it is important to first distinguish between the types

of data that researchers are sharing (or not sharing). Raw data is data that has not yet been

processed for use. Raw data includes everything from first hand observational data to the data

that is captured by reading an instrument. Derived data is data that has been “processed or

reduced in some way” and now is meaningful [19]. On a basic level, sharing raw data is difficult

because of its huge scope and unwieldiness [19]. The amount of raw data generated by scientists

is so great that, at times, it cannot practically be shared, and in some cases is not even maintained

for long-term preservation. There are also personal reasons that scientists choose not to share

raw data, such as the desire to build upon the data in future work. Sharing derived data is a more

common practice because it is easier for others to work with and “build on previous findings”

[19]. However, there are concerns that a lack of access to raw data leads to issues about data

integrity. Without raw data, scientists attempting to replicate the results will not know if they

were successful. Replication of results is an integral step in the scientific process and without

the sharing of raw data a system of “checks and balances” in the scientific community disappears

[19]. By sharing both derived and raw data scientists can work to ensure the integrity of the data

as well as increase trust in the quality of the data.

Data sharing behaviors, in terms of publishing datasets, are influenced by a number of different

factors. One of these factors is a lack of time and resources [19]. Many researchers believe that

the cost associated with proper data management is too high. Often times there is also a need for

someone else to expertly curate and manage their data, which just adds to the cost of

management. Many scientists believe that even if there are funds provided to pay for data

management, that these funds should be used to conduct more research and not to manage and

preserve data [19]. This mentality is similar to many researchers’ opinions on the amount of

time needed to manage data. There is a good deal of time needed to effectively manage data and

researchers feel that this time does not justify the benefits of managing their data. Also, if

researchers do publish their datasets, there is a concern about the amount of time spent dealing

with requests for additional information [19]. Processing requests for information and providing

explanations of their datasets requires valuable time that researchers believe could be better used

in actually conducting and analyzing research. Researchers fear that valuable time will be

wasted on gathering and transmitting explanations such as what tools were used or associated

metadata [19].

Another constraint in publishing datasets is a lack of experience or skill in managing data,

“especially making it accessible and usable” [19]. Many researchers find the data management

process to be an “unfamiliar and daunting prospect” [19]. However, in the field of Earth

Science, many data centres and institutions do provide assistance in managing data. NERC data

centres provide “support and guidance in data management to those funded by NERC” [16].

AADC ensures that data is easily accessible and “managed for the long-term” [9]. BODC has an

international reputation for its proficiency in handling data. BODC’s main objective is to

“ensure good data management practices” and facilitating data sharing [7]. Nevertheless, though

data centres may have good policies and practices in place, ultimately they depend on

“willingness to share” [20]. Even with assistance from some data centers and institutions, some

scientists still choose not to seek help and thus struggle with proper data management [19]. In

addition to a lack of skill surrounding data management, scientists also at times have a lack of

knowledge as to where to archive their data [19].

Competitive factors and fear of being “scooped” are major concerns for most researchers [19].

Some scientists fear that sharing their data too soon is a significant risk and that anyone could

then take credit for their research and work. Researchers sometimes wish to have exclusive use

of their data collections and control who has access to them in order to reduce the risk of being

scooped [19]. As previously discussed, many data centers and institutions have embargo periods

on data use, such as NASA and the AADC, but many researchers still do not want to share their

data. Another fear is that of exploitation and that perhaps others might misrepresent their data or

that “unwarranted conclusions may be drawn” [19].

An additional consideration in sharing datasets is the uncertainty that they are useful [19]. Some

find it hard to believe that “anyone will want access to their datasets” [19]. This is not

particularly applicable to most datasets produced by the Earth Sciences, but some researchers do

feel that model-run data or datasets produced from small scale projects are not necessarily in

high demand [19]. Though this uncertainty is not a main concern in the Earth Sciences

discipline, it is still a consideration for researchers when deciding to share data.

Lastly, data sharing behaviors are influenced by a lack of incentives or rewards [19]. As David

Carlson, former director of the International Polar Year International Programme Office, puts it

“Earth scientists need better incentives, rewards and mechanisms to achieve free and open data

exchange” [20]. He believes that though issues like data quality, data management, and

technical impediments do hinder data sharing, the real problem is behavior [20]. Carlson

believes that practical solutions such as the development of a ‘Polar Information Commons,’ data

centers willingness to share, and the establishment of journals such as the Earth System Science

Data journal, mentioned previously, are necessary steps in changing the behavior of scientists

involved in the International Polar Year Programme. However, many researchers still believe

that there is not much benefit to them for spending the time and effort required to effectively

share data or datasets.

The Research Information Network (RIN), in association with the Joint Information Systems

Committee (JISC) and the Natural Environment Research Council, conducted a study to analyze

the sharing behaviors of different scientific disciplines. Their study discovered underlying

motivations to publish datasets as well as benefits and incentives to sharing datasets. One reason

scientists fail to share data is that there are “no career-related rewards for sharing or publishing

datasets” [21]. If sharing is not recognized in terms of increased funding, then scientists feel

little reason to share. Also, according to this study, data publishing is a distant second to

publishing papers [21]. Of course there are also positive motivations to sharing datasets as well.

Altruistic motivations and acting for the good of the progression of science is a positive

motivation [21]. The study found that researchers who commonly share data are much more

likely to have the behavior reciprocated and also feel freer to ask peers for use of their data [21].

Another motivation is “greater visibility” for the researcher or for the institution or group they

represent [21]. The more published datasets that a researcher shares the more recognized and

visible they become, which could lead to more opportunities for collaborations among

researchers and institutions [21]. These collaborations could be between those within the same

discipline or they could be interdisciplinary collaborations. Sharing data can also lead to

opportunities to co-author papers with other researchers [21]. The study also found that

encouragement among peers is a motivation to sharing datasets [21]. Thus simple

encouragement can sometimes foster data sharing among different disciplines. Finally, a

personal interest in data sharing and data preservation is often times an incentive to effective data

sharing [21].

According to this same study, researchers elaborated on specific incentives that would

“encourage them to devote more attention to publishing or sharing their data” [21]. One

incentive is to give evidence, through case studies, that there are benefits to publishing datasets

[21]. Researchers want there to be some value in taking the time manage and preserve their

datasets for future use. Another incentive researchers cited is more defined rewards involving

“career progression” [21]. Researchers believe this can be accomplished in many ways including

“closing the gap” between the rewards gained for publishing a paper rather than publishing a

dataset and “taking account” of past data sharing behaviors when determining grant funding. A

final incentive that researchers cited to foster sharing data is a “standard, workable” method of

citing their datasets [21].

A brief summary of data sharing in the fields of Astronomy and Climate Science in the

United Kingdom

According to the same report commissioned by the Research Information Network, there are

certain conclusions that can be drawn about data sharing practices based on the results of their

interviews with researchers in different disciplines. In the field of astronomy, the cost of

curating and preserving data is very high [21]. The cost of storage space is substantial, though

according to the report the cost is decreasing, as well as the cost to train those who will care for

the data long-term [21]. Another issue is accessibility to older data because the software

required to support this is “no longer adequately supported” [21]. In terms of motivations and

constraints in publishing data, astronomers will almost always provide data for requested

datasets as long as the embargo period is past [21]. However, astronomers are much more

reluctant to share information that has not yet been published [21]. According to the report, if

there are no mandatory policies for data sharing, the motivation to share is likely “career reward”

from publishing journal articles [21]. In general, the culture of data sharing in the field of

astronomy is high with few infrastructure-related barriers to publishing data and the tendency to

publish datasets, with metadata and appropriate documentation, is high [21].

In the climate sciences discipline, opinions about data preservation seem to depend on the type of

data being preserved. Many believe that the lifespan for model run data is five years since it

becomes obsolete quickly, while raw data, especially observations, “should be curated for the

long-term” [21]. The value of these two types of data differs as well, in that many believe raw

model run data is only valuable to the creator while raw observational data is valuable to all [21].

Of course, value is added to the raw model run data after it is processed [21]. Researchers in the

climate sciences, tend to “respond positively” to requests for access of datasets, though low

demand for model run data discourages researchers from properly managing their data to make it

understandable to all [21].

Conclusion

As Carlson stated in his article A Lesson in Data Sharing, “A perfect data sharing system is

science’s ‘unobtainium’” [20].

As Albert Einstein said, “The restriction of knowledge to an elite group destroys the spirit of

society and leads to its intellectual impoverishment.” In the case of data sharing in the Earth

Sciences, this is most assuredly true.

References

1. Ball CA, Sherlock G, Brazma A: Funding high-throughput data sharing. Nature Biotechnology 2004, 22:1179 – 1183[http://dx.doi.org/10.1038/nbt0904-1179], CiTO: cites for information, Journal Article, peer reviewed.

2. Eynden VV, Corti L, Woollard M, Bishop L: Managing and Sharing Data: a best practice guide for researchers. UK Data Archive 2009 [http://www.esds.ac.uk/news/ publications/managingsharing.pdf], CiTO: cites for information, as authority, Report, peer reviewed.

3. Waaijers L, Graff M: Quality of Research Data, an Operational Approach. D-Lib Magazine 2011, 17:No. 1/2[http://dx.doi.org/10.1045/january2011-waaijers], CiTO: cites for information, uses data from, Magazine Article, not peer reviewed.

4. Wallis J, Borgman C, Mayernick M, Pepe A, Ramanathan N, Hansen M: Know Thy Sensor: Trust, Data Quality, and Data Integrity in Scientific Digital Libraries. LNCS 2007, 4675:380-391[http://dx.doi.org/10.1007/978-3-540-74851-9_32], CiTO: cites for information, Book Chapter, not peer reviewed.

5. Center for Embedded Network Sensing. [http://research.cens.ucla.edu/projects/2007 /Seismic/mexico/field_maintenance/].

6. National Oceanic and Atmospheric Administration. [http://www.cio.noaa.gov /Policy_Programs/info_quality.html].

7. British Oceanic Data Center. [https://www.bodc.ac.uk/data/where_to_find_data/].8. National Aeronautics and Space Administration.

[http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/].9. Australian Antarctic Data Centre. [http://data.aad.gov.au/aadc/about/data_policy.cfm].10. Hense A, Quadt F: Acquiring High Quality Research Data. D-Lib Magazine 2011, 17:No.

1/2[http://dx.doi.org/10.1045/january2011-hense], CiTO: cites for information, Magazine Article, not peer reviewed.

11. Global Observing Systems Information Center. [http://gosic.org/].12. National Information Standards Organization: Understanding Metadata. NISO Press 2004,

[http://www.niso.org/publications/press/UnderstandingMetadata.pdf], CiTO: cites for information, Report Document, not peer reviewed.

13. Joint Data and Information Management Panel: GCOS/GOOS/GTOS Joint Data and Information Management Plan. JDIMP 2000, [http://www.wmo.int/pages/prog/gcos /Publications/gcos-60.pdf], CiTO: cites for information, as authority, Report, not peer reviewed.

14. NOAA: NOAA/National Climatic Data Center Open Access to Physical Climate Data Policy. NOAA 2009, [http://www.ncdc.noaa.gov/oa/about/open-access-climate-data-policy.pdf], CiTO cites as authority, for information, Report, not peer reviewed.

15. NASA Data Rights and Related Issues Preamble. [http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/data-rights-related-issues/].

16. National Environment Research Council. [http://www.nerc.ac.uk/research/sites/data/ policy2011.asp].

17. Creative Commons. [http://creativecommons.org/licenses/by/3.0/].18. Earth System Science Data. [http://www.earth-system-science-data.net/review/

ms_evaluation_criteria.html].19. Griffiths A: The Publication of Research Data: Researcher Attitudes and Behaviour. The

International Journal of Digital Curation 2009, 4: No. 1[http://www.ijdc.net /index.php/ijdc/article/viewFile/101/76], CiTO cites for information, Journal Item, peer reviewed.

20. Carlson D: A Lesson in Sharing. Nature 2011, 469:293[http://dx.doi.org/10.1038 /469293a], CiTO cites for information, Journal Article, peer reviewed.

21. Research Information Network: To Share or not to share: Publication and quality assurance of research data outputs. RIN 2008, [http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs], CiTO cites for information, as authority, Report, not peer reviewed.