67
Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License Identity, Location, and Citation Mark A. Parsons with help from Ruth Duerr and Peter Fox NEON 7 February 2014

Identity, Location, and Citation at NEON

Embed Size (px)

Citation preview

Page 1: Identity, Location, and Citation at NEON

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License

Identity, Location, and Citation

Mark A. Parsons with help from Ruth Duerr and Peter Fox !!!!NEON 7 February 2014

Page 2: Identity, Location, and Citation at NEON

First let’s talk about metaphor

Page 3: Identity, Location, and Citation at NEON

Metaphor is for most people a device of the poetic imagination and the rhetorical flourish— a matter of extraordinary rather than ordinary language. Moreover, metaphor is typically viewed as characteristic of language alone, a matter of words rather than thought or action. For this reason, most people think they can get along perfectly well without metaphor.

We have found, on the contrary, that metaphor is pervasive in everyday life, not just in language but in thought and action. Our ordinary conceptual system, in terms of which we both think and act, is fundamentally metaphorical in nature.

Page 4: Identity, Location, and Citation at NEON

Is data publication the right metaphor? M. A. Parsons & P. A. Fox Data Science Journal, 2013 http://dx.doi.org/10.2481/dsj.WDS-042

Page 5: Identity, Location, and Citation at NEON

Purpose of Data Citation

• Aid scientific reproducibility through direct, unambiguous connection to the precise data used

• Credit for data authors and stewards

• Accountability for creators and stewards

• Track impact of data set

• Help identify data use (e.g., trackbacks)

• Data authors can verify how their data are being used.

• Users can better understand the application of the data.

!

• A locator/reference mechanism not a discovery mechanism per se

Page 6: Identity, Location, and Citation at NEON

Identifier vs. Locator

• Human ID: Mark Alan Parsons (son of Robert A. and Ann M., etc.) • every term defined independently (only unique in context/provenance) • Alternative like a social security number (or ORCID) requires a well controlled

central authority and unchanging objects. • Human Locator: Amos Eaton Hall, Room 209, 110 8th St., Troy, NY 12180.

• every term has a naming authority, i.e. a type of registry !

• Data Set IDs: data set title, filename, database key, object id code (e.g. UUID), etc.

• Data set Locators: URL, directory structure, catalog number, registered locator (e.g. DOI), etc.

Page 7: Identity, Location, and Citation at NEON

One of the main purposes of assigning DOI names (or any persistent identifier) is to separate the location information from any other metadata about a resource. Changeable location information is not considered part of the resource description. Once a resource has been registered with a persistent identifier, the only location information relevant for this resource from now on is that identifier, e.g., http://dx.doi.org/10.xx. !— DataCite Metadata Scheme for the Publication and Citation of Research

Data, Version 2.2, July 2011 (my emphasis).

Page 8: Identity, Location, and Citation at NEON

How data citation is currently done

• Citation of traditional publication that actually contains the data, e.g. a parameterization value.

• Not mentioned, just used, e.g., in tables or figures

• Reference to name or source of data in text

• URL in text (with variable degrees of specificity)

• Citation of related paper (e.g. CRU Temp. records recommend citing two old journal articles which do not contain the actual data or full description of methods.)

• Citation of actual data set typically using recommended citation given by data center

• Citation of data set including a persistent identifier/locator, typically a DOI

Page 9: Identity, Location, and Citation at NEON

0 100 200 300 400 500 600

2002

2003

2004

2005

2006

2007

2008

2009

“MODIS Snow Cover Data” in Google Scholar

1.3%

1.0%

0.7%

0.7%

1.3%

0.9%

1.3%

1.7%

Formal Citation Total Entries

Page 10: Identity, Location, and Citation at NEON

Data Citation Guidelines

• Federation of Earth Science Information Partners. 2012. http://commons.esipfed.org/node/308 and related guidelines for the Group on Earth Observations (GEO)

• Best available for Earth system science. Not yet widely adopted. • Digital Curation Center. 2011. http://www.dcc.ac.uk/resources/how-guides/cite-datasets

• Best overall guide. Not yet widely adopted. • Digital Mapping Techniques '00 -- Workshop Proceedings. USGS Open-File Report 00-325 “Proposal for

Authorship and Citation Guidelines for Geologic Data Sets and Map Images in the Era of Digital Publication.” By Stephen M. Richard http://pubs.usgs.gov/of/2000/of00-325/richard.html

• Detailed treatment of map-based data, but seemingly not well recognized. Does not address location. • DataCite—a well-recognized consortium of libraries and related organizations working to define a citation

approach and assign DOIs. Also working to get data citations included in citation indices. • CODATA/ICSTI and NAS Task Group conducted an excellent workshop that summarizes approaches and

issues: http://www.nap.edu/catalog.php?record_id=13564. A final report summarizing the state of the art will be out soon.

• DataVerse Network Project—a standard from the social science community using a Handle locator and “Universal Numerical Fingerprint” as a unique identifier.

• NASA DAACs, some NOAA and NSF centers adopting ESIP-based approaches. • Various life and social science centers have standardized approaches with increasing adoption, e.g. Dryad.

Page 11: Identity, Location, and Citation at NEON

The Evolution of Data Citation—Then

• Data was part of the literature—tables, maps, monographs, etc.—and we cited accordingly. (Some data were still hoarded).

• Digital data becomes the norm. It’s messier and we forget how to do cite it routinely.

• Initial efforts to define digital data citation in the late 90s - early 00s • Right idea, little traction • Partially conflated with the citing URLs issue

• A blossoming in the mid-late 00s. • Multiple disciplines start developing approaches and guidelines • DOI a big driver, especially for DataCite, but other identifiers used too

(Handles, LSIDs, UNFs, ARKs and good ol’ URI/Ls) • A slightly competitive atmosphere

Page 12: Identity, Location, and Citation at NEON

• Now a consensus phase • Out of Cite, Out of Mind: The Current State of Practice, Policy, and

Technology for the Citation of Data. 2013. http://dx.doi.org/10.2481/dsj.OSOM13-043

• Draft Global Joint Declaration of Data Citation Principles. 2013.http://www.force11.org/datacitation

The Evolution of Data Citation—Now

Page 13: Identity, Location, and Citation at NEON

• Implementation phase just begun • ESIP Guidelines adopted by a variety of NASA and NOAA data centers and

internationally by GEOSS. • AGU Publishing Committee is developing author guidelines based on ESIP. • ESA Data Policy requires data deposit bit not citation • Other disciplines, notably social science, has relationships with publishers. • Several data centers partnering with publishers, e.g. Elsevier’s “article of the

future”. • It happens locally and requires culture change so debates will continue

The Evolution of Data Citation—Next

Page 14: Identity, Location, and Citation at NEON

• Everything needs an identifier. Most things need locators. Intellectual content needs citation.

• Different versions of things may need different identifiers/locators

• Subsets may need identifiers or clear reference to sub-setting process (e.g. space and time).

• Different representations (conceptual models) may need different identifiers/locators. E.g Maps.

What needs an identifier/locator? What needs to be cited?

Page 15: Identity, Location, and Citation at NEON

News Flash! September 2011 Greenland ice shrinks 15% since 1999 according to

new edition of The Times Atlas

Page 16: Identity, Location, and Citation at NEON

• UPDATED: Atlas Shrugged? 'Outraged' Glaciologists Say Mappers Misrepresented Greenland Ice Melt

• Mapmakers' claim on shape of Greenland suddenly melts away

• A greener Greenland? Times Atlas 'error' overstates global warming

• Row over how much Greenland has shrunk• Times Atlas is 'wrong on Greenland climate change'• Times Atlas accused of 'absurd' climate change ice error

Headlines a couple days later

Page 17: Identity, Location, and Citation at NEON

The Culprit?

Page 18: Identity, Location, and Citation at NEON

• Maurer, J. 2007. Atlas of the Cryosphere. Boulder, Colorado USA: National Snow and Ice Data Center. Digital media.

• Bamber, J.L., R.L. Layberry, S.P. Gogenini. 2001. A new ice thickness and bed data set for the Greenland ice sheet 1: Measurement, data reduction, and errors. Journal of Geophysical Research. 106(D24): 33773-33780. Data provided by the National Snow and Ice Data Center DAAC, University of Colorado, Boulder, Colorado USA. Available at http://nsidc.org/data/nsidc-0092.html. 25 October 2006.

• Bamber, J.L., R.L. Layberry, S.P. Gogenini. 2001. A new ice thickness and bed data set for the Greenland ice sheet 2: Relationship between dynamics and basal topography. Journal of Geophysical Research. 106(D24): 33781-33788. Data provided by the National Snow and Ice Data Center DAAC, University of Colorado, Boulder, Colorado USA. Available at http://nsidc.org/data/nsidc-0092.html. 25 October 2006.

Page 19: Identity, Location, and Citation at NEON

Basic data citation form and content

Per DataCite: Creator. PublicationYear. Title. [Version]. Publisher. [ResourceType]. Identifier. !

Per ESIP: Author(s). ReleaseDate. Title, [version]. [editor(s)]. Archive and/or Distributor. Locator. [date/time accessed]. [subset used]. !

Page 20: Identity, Location, and Citation at NEON

An Example Citation

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

Page 21: Identity, Location, and Citation at NEON

Author

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

Page 22: Identity, Location, and Citation at NEON

Release Date

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

Page 23: Identity, Location, and Citation at NEON

Title

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

Page 24: Identity, Location, and Citation at NEON

Version

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

Page 25: Identity, Location, and Citation at NEON

Editor

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

Page 26: Identity, Location, and Citation at NEON

Archive and/or Distributor

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

Page 27: Identity, Location, and Citation at NEON

Locator

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.

Page 28: Identity, Location, and Citation at NEON

Locator

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://dx.doi.org/10.5060/D4H41PBP.

Page 29: Identity, Location, and Citation at NEON

ID Scheme Data Set Item Data Set Item Data Set Item Data Set Item

URL/N/I

PURL

XRI

Handle

DOI

ARK

LSID

OID

UUID

An assessment of identification schemes for digital Earth science data

Unique Identifier

Unique Locator

Citable Locator

ScientificallyUnique ID

Loca

tors

Ident

ifiers

Good Fair Poor

Adapted from Duerr, R. E., et al.. 2011. On the utility of identification schemes for digital Earth science data: An assessment and recommendations. Earth Science Informatics. 4:139-160.http://dx.doi.org/10.1007/s12145-011-0083-6

Page 30: Identity, Location, and Citation at NEON

Suggested identifier practice for now

• DOI’s (or ARKs, maybe Handles) for data sets

• UUID’s (perhaps ARKs) for data items

• Systems need to be prepared to support multiple identifiers and locators over time

• ESIP data citation guidelines (http://commons.esipfed.org/node/308 )

• Assign identifiers/locators to associated provenance and contextual materials, as much as you can really.

• Continue to explore adding identifiers to people, organizations, instruments, etc.

• Participate in RDA and ESIP communities addressing these issues

Page 31: Identity, Location, and Citation at NEON

Other topics

• Identifying/locating versions • Identifying/locating subsets—“Micro-citation” • Landing pages and content negotiation • Identifiers/locators for more than data • Choosing between EZID and CrossRef

Page 32: Identity, Location, and Citation at NEON

Why the DOI?

• Not perfect but well understood by publishers• Thomson Reuters collaborating with DataCite to get data citations in

their index.!But...• What is the citable unit?• How do we handle different versions?• What about “retired” data?• When is a DOI assigned?

Page 33: Identity, Location, and Citation at NEON

Issues largely resolved by...

• A defined versioning scheme• Good tracking and documentation of the versions• Due diligence in archive and release practices

Page 34: Identity, Location, and Citation at NEON

When to assign a DOI?

• First principle: Data should be citable as soon as they are available for use by anyone other than the original authors.

• But... • Most people (falsely) believe that a DOI implies permanence so how do we

cite transient data? • Some believe that a DOI should not be assigned until the data has

undergone some level of review (e.g. Lawrence et al. 2010). So how do we cite data used before the review?

• Data are often used by friends and collaborators in a raw, “unpublished” state. Should this use be cited with a DOI?

• Near real time or preliminary data may only be available for a short uncurated, period, and there may not be a good match between the submission package and the distribution package. What gets the DOI? When?

Page 35: Identity, Location, and Citation at NEON

Versioning approach recommended by DCC

• “As DOIs are used to cite data as evidence, the dataset to which a DOI points should also remain unchanged, with any new version receiving a new DOI.”

• “There are two possible approaches the data repository can take: time slices and snapshots.”

Page 36: Identity, Location, and Citation at NEON

Versioning and locators: some suggestions from NSIDC

• major version.minor version.[archive version]• Individual stewards need to determine which are major vs. minor versions and describe the

nature and file/record range of every version.• Assign DOIs to major versions. • Old DOIs should be maintained and point to some appropriate page that explains what

happened to the old data if they were not archived.• A new major version leads to the creation of a new collection-level metadata record that is

distributed to appropriate registries. The older metadata record should remain with a pointer to the new version and with explanation of the status of the older version data.

• Major and minor version (after the first version) should be exposed with the data set title and recommended citation.

• Minor versions should be explained in documentation, ideally in file-level metadata.• Applying UUIDs to individual files upon ingest aids in tracking minor versions and historical

citations.

Page 37: Identity, Location, and Citation at NEON

Basic data citation form and content

Author(s). ReleaseDate. Title, Version. [editor(s)]. Archive and/or Distributor. Locator. [date/time accessed]. [subset used].!

!

!

The best solution is to have unique identifiers or query IDs for subsets, but that won’t be available for most data sets for a long time, so we need alternative solutions...

Page 38: Identity, Location, and Citation at NEON

!February 8, 2011, 4:45 PM Page Numbers for Kindle Books an Imperfect Solution

Amazon’s Kindle will have page numbers that correspond to real books and locations by passage.

Neither solution is perfect—‘locations’ or page numbers—because the problem is unsolvable. The best we can hope for is a choice...

http://pogue.blogs.nytimes.com/2011/02/08/page-numbers-for-kindle-books-an-imperfect-solution/

Page 39: Identity, Location, and Citation at NEON

• Bible

• Koran

• Bhagavad-Gita and Ramayana

• other sacred texts

!

• A “structural index”

Chapter and Verse

Page 40: Identity, Location, and Citation at NEON

The “Archive Information Unit”

“An Archival Information Package whose Content Information is not further broken down into other Content Information components, each of which has its own complete Preservation Description Information. It can be viewed as an ‘atomic’ AIP”“From an Access viewpoint, new subsetting and manipulation capabilities are beginning to blur the distinction between AICs and AIUs. Content objects which used to be viewed as atomic can now be viewed as containing a large variation of contents based on the subsetting parameters chosen. In a more extreme example, the Content Information of an AIU may not exist as a physical entity. The Content Information could consist of several input files (or pointers to the AIPs containing these data files) and an algorithm which uses these files to create the data object of interest.” • CCSDS. 2002. Reference Model for An Open Archival Information System (OAIS) CCSDS

Page 41: Identity, Location, and Citation at NEON

Citation scenarios and production patterns

• What kind of “atomic” item is being cited—the “Archive Information Unit (AIU)” (e.g., a data file, a data element within a file, a relational (or other) database, a job “residue”)?

• How many AIUs items are in a typical citation for the scenario being considered?

• What other digital or physical objects need to be available to make the unit usable—the “Preservation Description Information (PDI)”?!Key Question:

• What structure or structures can we use to organize data collections that might be common across Earth sciences?

Page 42: Identity, Location, and Citation at NEON

An example production pattern for Cline et al. (2003).

Page 43: Identity, Location, and Citation at NEON

�43

Page 44: Identity, Location, and Citation at NEON

�44

Page 45: Identity, Location, and Citation at NEON

�45

Page 46: Identity, Location, and Citation at NEON

�46

Page 47: Identity, Location, and Citation at NEON

�47

Page 48: Identity, Location, and Citation at NEON

�48

Page 49: Identity, Location, and Citation at NEON

�49

Page 50: Identity, Location, and Citation at NEON

�50

Page 51: Identity, Location, and Citation at NEON

�51

Page 52: Identity, Location, and Citation at NEON

�52

Page 53: Identity, Location, and Citation at NEON

�53

Page 54: Identity, Location, and Citation at NEON

TRANSECT,IOP ,DATE ,TIME,UTME ,UTMN ,DEPTH ,SWET,SRUF,CNPY, TEMP,SURVEYOR ,QC ,COMMENTS ! , , , , , ,cm , , , , deg-F, , , !FAA01.1 ,iop4,2003-03-25,1017,425941,4410860, 104, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," "!FAA01.2 ,iop4,2003-03-25,1017,425956,4410860, 13, d, n, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," "!...!FAA04.1 ,iop4,2003-03-25,1221,425938,4411193, 325, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) ","Couldn't find post, used GPS 5940 1197; FAA4.4 and FAA4.5 unsafe, avalanche area!

collated and named jpgsinterim jpgscamera

field notebook Excel v1 printout Excel v2

shapefiles

ascii files

born digital

analog to digital w/ QC

100s 100s

1000s

distributed data set

tarball

Couldn't find post, used GPS 5940 1197; FAA4.4 and FAA4.5 unsafe, avalanche area!

A production pattern for Cline et al., 2003

HTML Doc. +

Page 55: Identity, Location, and Citation at NEON

ArchivesNSIDCGSFC EDC

MODAPs Processing

1 file/day/tile (grid cell) Each file contains metadata describing previous inputs and detailed versioning

Crude, inaccurate production pattern for MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005 (Hall et al., 2007)

1,000,000s

Page 56: Identity, Location, and Citation at NEON

Doing it as best we can...?

• Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007- Sep. 2008, 84°N, 75°W; 44°N, 10°W. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-11-01 at http://dx.doi.org/10.1234/xxx.

• Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007- Sep. 2008, Tiles (15,2;16,0;16,1;16,2;17,0;17,1). Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-11-01 at http://dx.doi.org/10.1234/xxx.

• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, Version 2.0, shapefiles. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://dx.doi.org/10.5060/D4H41PBP.

Page 57: Identity, Location, and Citation at NEON

“Landing Pages” For humans and machines

Page 58: Identity, Location, and Citation at NEON

Long  formhttp://data.rpi.edu/repository/handle/10833/24?show=full

Page 59: Identity, Location, and Citation at NEON

Conneg

• Many  examples,  but  what  follows  is  ~  from:  http://www.crosscite.org/cn/  

• What  is  it?  !

– Es  ce  que  vous  parlez  Français?  – Do  you  speak  html  or  JSON  or  RDF?

Page 60: Identity, Location, and Citation at NEON

Conneg

Page 61: Identity, Location, and Citation at NEON

Supported  content  types..

Page 62: Identity, Location, and Citation at NEON

Metadata

• Title• Author• Author Email• Licence• Subject• Keyword• Data Type

DatasetCDF

DCO Object Deposit DCO Research Network

DCO-ID Request DCO-ID Request

Share Knowledge

Join Network

Allocate a universal accessible DCO-ID

Register Metadata

Upload Raw Data

DCO Object Registration and Deposit

DCO Research Community Network

Page 63: Identity, Location, and Citation at NEON

Further  integration..

Page 64: Identity, Location, and Citation at NEON

Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation

DataCite/EZID vs CrossRef/PILA

DataCite/EZID• Primarily meant for data

• Numerical data • Other research data outputs

• Support for common metadata needed for finding or understanding data

• Supports point, bounding box, and place names for geoLocations

• Can link metadata record to DataCite record • Built in support for many types of

relationships to other resources including physical objects

• Built in support for versioning • Working to be included in citation

indexes, etc. • Schema and services actively evolving

CrossRef/PILA•Primarily meant for publications

• Only registers metadata for works not individual manifestations of a work

• Schema allows multiple resolution • Can register data associated with a work • Must also provide linking information about

references in the work •Support for common metadata needed for finding publications or parts thereof

• Books • Journals • Conference proceedings

•Well integrated into existing citation metrics, indexing, etc. providers using a pay for query model

Presented to the USGS Digital Object Identifiers Focus Group By Ruth Duerr, June 12, 2013

Page 65: Identity, Location, and Citation at NEON

Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation

EZID vs CrossRef/PILA

EZID• EZID supports both DataCite’s DOI’s

and ARK’s (lower cost) • EZID suggests using ARK’s prior to

decision to support an object in perpetuity • ARK’s can be deleted • An ARK can be the suffix of a DOI • ARK’s can be used at the granule level

using a single registration and a “pass through” suffix

• Annual fee for up to 1 million IDs/yr based on • Profit/non-profit status • Size/status of organization

• Considering development of single DOI purchase capability

CrossRef/PILA•DOI’s are the only locator supported •Annual fee based on publishing revenue •Additional fee for each DOI assigned •Additional fee if linkage information is not provided for most content within 18 months

Presented to the USGS Digital Object Identifiers Focus Group By Ruth Duerr, June 12, 2013

Page 66: Identity, Location, and Citation at NEON

Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation

EZID vs CrossRef considerations

• Do you have mostly publications or mostly data or both?

• Do you want/need locators prior to making a decision about long-term availability?

• Are citation indexes, citation metrics, or the ability to support full-text access currently important to you?

• Which is more important to you - library or data concepts?

• What kind of metadata do you have about the things you need identifiers for?

Presented to the USGS Digital Object Identifiers Focus Group By Ruth Duerr, June 12, 2013

Page 67: Identity, Location, and Citation at NEON

Get involved!

• RDA proposed Working Group on citing dynamic data.

• http://rd-alliance.org/working-groups/data-citation-wg.html

• RDA WG on identifier “types”

• https://rd-alliance.org/working-groups/pid-information-types-wg.html

• ESIP Preservation and Stewardship Committee

• http://wiki.esipfed.org/index.php/Preservation_and_Stewardship