29
Data Citation Principles Workshop May 16 – 17, 2011 IQSS at Harvard University Deep Data Citation Mechanism and Service for Scientific Data: Defining Framework for Biodiversity Data Publishers Biodiversity Data Publishers Vishwas Chavan lbl d f l Global Biodiversity Information Facility (GBIF) Secretariat May 16, 2011

Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Citation Principles WorkshopMay 16 – 17, 2011 IQSS at Harvard University

Deep Data Citation Mechanism and Service for Scientific Data: Defining Framework for Biodiversity Data PublishersBiodiversity Data Publishers

Vishwas Chavanl b l d f lGlobal Biodiversity Information Facility (GBIF) Secretariat

May 16, 2011

Page 2: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Outlines

1. About GBIF1. About GBIF

2. GBIF Data Publishing Framework

3. Data Citation

4. Data Citation formulations

5. Waterfall Model of Data Citation

Page 3: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

About GBIF

Page 4: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

GBIF: Vision and Mission

Vision: A world in which biodiversity information is freely and

universally available for science, society, and a

sustainable future

Mission:Mission:To be the foremost global resource for biodiversity

information, and engender smart solutions for g

environmental and human well-being

Page 5: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

GBIF Country Participants 2011GBIF Country Participants 2011

Voting Participants 33

Last updated: 2011‐04‐11

Voting Participants: 33Associate Participants: 23Associate Participating Organisations: 46

Page 6: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Growth in Data Records 

270

280

290

sity

reco

rds

220

230

240250

260

ary

biod

iver

s

170

180

190

200210

ion

of p

rima

120

130

140

150160

Mill

i

80

90

100110

Last updated: 2011‐04‐12

Page 7: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Publishing Framework

Page 8: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Why should I publish data?Recognition

Why should I publish data?

Opportunities

Investment

Page 9: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Publishing FrameworkData Publishing Framework Cultural change towards ‘free and open access’ to bi di it d tbiodiversity data

Addresses social, technical, and policy concernsA ‘Wh i h f ?’ f ALLAnswer ‘What is there for me?’ for ALL

The Data Publishing Framework is defined as environment conducive to enabling free and open access to the world’s primary biodiversity data. The core purpose of the p y y p pframework is to overcome socio‐political, technical‐infrastructural, policy‐political, legal and economic investment barriers or impediments affecting the discoveryinvestment barriers or impediments affecting the discovery and publishing of data

Page 10: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

InfrastructureInfrastructure and Technical

Policy andLegal and 

PoliticalLegal

Socio‐l lEconomic CulturalEconomic

Chavan and Ingwersen (2009) , BMC Bioinformatics, 10 (Suppl. 14): S2 

Page 11: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

DPF: Core Technical Componentsp

Persistent Data Citation IdentifiersCitation 

Mechanisms

D U I dData Usage Index

Chavan and Ingwersen (2009) , BMC Bioinformatics, 10 (Suppl. 14): S2 

Page 12: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Citation

Page 13: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Citations today: ExampleSource: GBIF Data Portal data gbif netSource: GBIF Data Portal, data.gbif.netSearch string: Panthera tigrisSearch results: 696 records, from 37 datasets, published by 31 Data Publishers Date: Thursday, 4 November 2010, Time: 10.03.30

What was the search string?Please cite this data as follows:Please cite this data as follows:

Existing Data Citation style Un-answered factsUn-answered facts

How many records were retrieved?

How many Data Publishers

(accessed through GBIF data portal, Mammal specimens, http://data.gbif.org/datasets/resource/559)(accessed through GBIF data portal, Vertebrate specimens, http://data.gbif.org/datasets/resource/541)

(accessed through GBIF data portal, Mammal specimens, http://data.gbif.org/datasets/resource/559)(accessed through GBIF data portal, Vertebrate specimens, http://data.gbif.org/datasets/resource/541)

contributed to the data?

When search was carried out?

Who is the original contributor

(accessed through GBIF data portal, Natural History Museum Rotterdam, http://data.gbif.org/datasets/resource/693)(accessed through GBIF data portal, Database Schema for UC Davis Wildlife museum

(accessed through GBIF data portal, Natural History Museum Rotterdam, http://data.gbif.org/datasets/resource/693)(accessed through GBIF data portal, Database Schema for UC Davis Wildlife museum Who is the original contributor

of the data?

Who played what role from collection to publishing?

UC Davis Wildlife museum, http://data.gbif.org/datasets/resource/736)(accessed through GBIF data portal, UNSM Vertebrate Specimens, http://data.gbif.org/datasets/resource/812)......................................................................

UC Davis Wildlife museum, http://data.gbif.org/datasets/resource/736)(accessed through GBIF data portal, UNSM Vertebrate Specimens, http://data.gbif.org/datasets/resource/812)......................................................................

collection to publishing?...................................................................................................................................................................... How can I retrieve the same

result?

Page 14: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Citation: What is needed?

Deep data citation mechanismpRecognise ALL with their rolesMultilayer citation – producer, publisher, aggregator, curator

Cascading Citations ‐ citations within citations

Data Citation ServiceResolve citation any timeyDiscover the underlined data

Page 15: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Citation: ChallengesData Citation: Challenges

Dealing with dynamic streaming data?Dealing with dynamic streaming data? Resolving to human or machine interpretable description of object?p p j

Need for registry of name spaces? Can metadata standards support multipleCan metadata standards support multiple GUIDs?

Failure to enforce data citation as mandatory step in Publishing cycle

Page 16: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Waterfall model of data citation

Page 17: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Citation formulations

Types of Publishersyp Publisher (individual)  Publisher (group of individuals)Publisher (group of individuals) Institution or Research Group or Consortium

Release / Update frequencyRelease / Update frequencyOne time release Frequent updates Frequent updates

Page 18: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Citation formulations.....

Publisher (individual) one time data release

P bli h (YEAR) Titl f th d t t t l fPublisher (YEAR), <Title of the data resource>, <total nos. of records>, published <modes of publishing>, <Primary access point>, released on<release date>, <Persistent Identifier>.

Rumble KJ (1998) Cephalopods of North America 10023Rumble KJ (1998). Cephalopods of North America. 10023 records, published online, http://www.rumblejk.org/CephNA/, released on 31/12/1998, doi:10.4000/iisc.0.00.36.

Page 19: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Citation formulations.....

Publisher (group of individuals) frequent updates

Publisher 1 and Publisher n <YEAR) <Title of the dataPublisher 1, ..... and Publisher n <YEAR). <Title of the data resource>, <total nos. of records>, published <modes of publishing>, <Primary access point>, first released on<release date>, <current version no. or last

d t d/ l d (d t ) P i t t Id tifiupdated/released on (date)>, <Persistent Identifier>.

Remsen D, Bello J, Sheldon S, Raymond M, and AJK Arino(2005 -). Fishes of the Cape Cod Region, MA,USA. 70089 records published online, http://www.remsen.net/capecodfishes/, first released on 17/05/2005, last updated on 10/10/2010, doi: 11.3389/mbl.1.11.131.

Page 20: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Citation formulations.....

Institute/Research Group/ Consortium – frequent updates

<Publisher as Institution / Research Group / Consortium><Publisher as Institution / Research Group / Consortium> <YEAR (Year first published / released -)>, <Title of the data resource>, <total nos. of records>, <Contributed by contributor 1(role), contributor 2 (role)..... contributor

( l ) bli h d ( d f bli hi ) P in(role)>, <published (modes of publishing)>, <Primary access point>,<Version no., or last updated/released on (date)>, <Persistent Identifier>.

Smithsonian National Museum of Natural History (2002 -), Museum Collection Records: Mammals. 579257 records. Contributed by Helgen KM (Principal Investigator, cutrator, author), Gordon LK (manager author curator) Peurach SC (author manager) Potter(manager, author, curator), Peurach SC (author, manager), Potter CW (manager, author), Carleton MD (curator), Maldonado JE (author, developer), Wilson DE (curator, author), Thorington Jr RW (curator, author, validator), Ludwig CA (manager, developer, author), Lunde DP (author) Published onlineLunde DP (author). Published online, http://collections.nmnh.si.edu/search/mammals/, first released on 12/02/2002, last updated on 15/09/2010, doi:17.3377/smi.8.57.965.

Page 21: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Waterfall model for Data Citation Cascading citations Citations within citations Recognising roles in data management life cycle Recognising roles in data management life cycle Three types of citations

Publisher determined citations User driven citations Composite citations

Use of Persistent Identifiers (PI)( ) Persistent Identifiers for each citation types Support multiple types of PIs – Handles,

ARK PURL URN LSID DoI etcARK, PURL, URN, LSID, DoI etc. Data Citation service

Registration service: assign PI to citations Resolver service: resolve citations from PIs

Page 22: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Waterfall model for Data Citation: exemplification

Source: GBIF Data Portal, data.gbif.netSearch string: Panthera tigrisSearch results: 696 records from 37 datasets published by 31 Data Publishers

Source: GBIF Data Portal, data.gbif.netSearch string: Panthera tigrisSearch results: 696 records from 37 datasets published by 31 Data Publishers User SearchSearch results: 696 records, from 37 datasets, published by 31 Data Publishers Date: Thursday, 4 November 2010, Time: 10.03.30Search results: 696 records, from 37 datasets, published by 31 Data Publishers Date: Thursday, 4 November 2010, Time: 10.03.30

User Search

User driven citation using . http://data.gbif.net (2010). user doi:09.1111/gbif.9.11.444.

htt //d t bif t (2010) S h t i P th ti i 696htt //d t bif t (2010) S h t i P th ti i 696

User driven citation using Persistent Identifier

•resolve to full composite citation and/or•snapshot of resultant data http://data.gbif.net (2010). Search string:Panthera tigris, 696

records, contributed by 37 data resources, user doi: 09.1111/gbif.9.11.444, accessed on 04/11/2010, 10:03:30. (data resources: doi: 09.1111/lsu.9.11.559, unr:lsid:msu.org:observation:541, http://nhmr.nl/ark:/1205/693xz693, http://hdl loc gov/ucd/736 http://purl unsm org/unsm/812

http://data.gbif.net (2010). Search string:Panthera tigris, 696 records, contributed by 37 data resources, user doi: 09.1111/gbif.9.11.444, accessed on 04/11/2010, 10:03:30. (data resources: doi: 09.1111/lsu.9.11.559, unr:lsid:msu.org:observation:541, http://nhmr.nl/ark:/1205/693xz693, http://hdl loc gov/ucd/736 http://purl unsm org/unsm/812

•snapshot of resultant data can be accessed

Medium sized composite http://hdl.loc.gov/ucd/736, http://purl.unsm.org/unsm/812, urn:gnhm:0-486-1047, .........................., ..................., ...................., ..................., ......................., ......................, ...................., .........................., ............................, ..............................., ......................., ........................., ........................., ..........................., ........................, ..........................., ......................., ..........................,

http://hdl.loc.gov/ucd/736, http://purl.unsm.org/unsm/812, urn:gnhm:0-486-1047, .........................., ..................., ...................., ..................., ......................., ......................, ...................., .........................., ............................, ..............................., ......................., ........................., ........................., ..........................., ........................, ..........................., ......................., ..........................,

Medium sized composite citation

resolve to full composite citation including detailed

bl h d d......................., ......................., ............................., ........................., ........................, ......................, ....................., ............................, .................., ....................., ........................., ......................, ...................., ......................, ......................., ........................, ..................., .................., .........................., ...........................).

......................., ......................., ............................., .........................,

........................, ......................, ....................., ............................,

.................., ....................., ........................., ......................,

...................., ......................, ......................., ........................,

..................., .................., .........................., ...........................).

Publisher determined citation in cascading manner

Page 23: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Waterfall model for Data Citation: exemplification

http://data.gbif.net (2010). Search string:Panthera tigris, 696 records, contributed by 37 data resources, user doi: 09.1111/gbif.9.11.444, accessed on 04/11/2010, 10:03:30. http://data.gbif.net (2010). Search string:Panthera tigris, 696 records, contributed by 37 data resources, user doi: 09.1111/gbif.9.11.444, accessed on 04/11/2010, 10:03:30.

Full length composite citation

User driven citation1. Louisian State University (2007), Museum of Natural Science: Collection of Mammal,

36000 records. Contributed by Patterson DN (Principal Investigator, architect, author), Sandeep PK (author, curator), Fieldman LN (author, developer), Remsen D (curator, validator), published online http://www.museum.lsu.edu/MNS/mammcoll.hml, released on

1. Louisian State University (2007), Museum of Natural Science: Collection of Mammal, 36000 records. Contributed by Patterson DN (Principal Investigator, architect, author), Sandeep PK (author, curator), Fieldman LN (author, developer), Remsen D (curator, validator), published online http://www.museum.lsu.edu/MNS/mammcoll.hml, released on

Institutional dataset, onetime release, doi

October 2007, doi:09.1111/lsu.9.11.559.2. Michigan State University (2001 -), MSU Vertebrate Collection, 76523 records. Contributed

by Cook DK (Principal Investigator, author, curator, validator), Hirsh L (author, architect, developer), Lane MP (manager, author, curator)............, Morris JH (curator), published

October 2007, doi:09.1111/lsu.9.11.559.2. Michigan State University (2001 -), MSU Vertebrate Collection, 76523 records. Contributed

by Cook DK (Principal Investigator, author, curator, validator), Hirsh L (author, architect, developer), Lane MP (manager, author, curator)............, Morris JH (curator), published

Institutional dataset, frequent update, lsid

online http://musuem.msu.edu/ResearchandCollections/DVNH, first released on 01/10/2001, last updated on 18/01/2010, urn:lsid:msu.org:observation:541.

3. Cursada PK, Bello J, and AJK Moelicker (2006), Natural History Museum Rotterdam: Mammal collection, 1123 records, published online, http://www.nlbif.nl/nhmr_mc/, released

online http://musuem.msu.edu/ResearchandCollections/DVNH, first released on 01/10/2001, last updated on 18/01/2010, urn:lsid:msu.org:observation:541.

3. Cursada PK, Bello J, and AJK Moelicker (2006), Natural History Museum Rotterdam: Mammal collection, 1123 records, published online, http://www.nlbif.nl/nhmr_mc/, released on 7 July 2006, http://nhmr.nl/ark:/1205/693xz693...........................................................................................................................................................................................................................................................................................................................

on 7 July 2006, http://nhmr.nl/ark:/1205/693xz693...........................................................................................................................................................................................................................................................................................................................

Multiple authors, frequent update, ARK

Single author, frequent update, handel37. Rumble KJ (1998 -). Vertebarte collection of Rumble 1960-1999. 786 records, published

online, http://www.sbnature.org/rumble_collection/, first released on 13/09/1998, last updated on 27/01/2010, http://hdl.oclc.gov/sbnature/5678.

37. Rumble KJ (1998 -). Vertebarte collection of Rumble 1960-1999. 786 records, published online, http://www.sbnature.org/rumble_collection/, first released on 13/09/1998, last updated on 27/01/2010, http://hdl.oclc.gov/sbnature/5678.

Page 24: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Waterfall model for Data Citation: H ill hi h ?How will this happen?

Publisher determined citations Detailed citation as part of metadata document, and/or Register citation at ‘Citation Service’ Persistent Identifier is assigned to metadata document

d/ i i land/or citations alone

U d i it ti User driven citations Search data through Publisher access point

Single dataset – Search result together with Publisher determined citationPublisher determined citation

Multiple datasets – Search result together with all datasets User write ‘user driven’ string of citation U i t it ti ith ‘Cit ti S i ’ User register citation with ‘Citation Service’ User archive snapshot of search linked to ‘user driven

citation’

Page 25: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Process for Publisher determined citations

Citation Service(s)

Database Administrator Discovery

Aggregators

Database Administrator

P i

Platforms

AggregatorsDatabase Administrator

PersistentIdentifier

Networks

MetadataCatalogue(s)

Database Administrator Information

SystemsCatalogue(s)Database

Administrator

Page 26: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Process for User driven citations

User Archive access data

dataset used

Finalise dataset for use

Citation Service 

returns PI

Use ‘Citation Service’

Page 27: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Wish List for Data CitationWish List for Data Citation

Best practice guide for data citationBest practice guide for data citation Persistent identifiers to datasets Credit to all players from data producers to Credit to all players from data producers to publishers, aggregators etc.

All levels of granularity and combinationsAll levels of granularity and combinationsWith or without annotations Link between traditional literature and data Link between traditional literature and data Coordinated citation support for ALL Research metrics for datasets Research metrics for datasets

Page 28: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Impact of Data Citationp

Data CitationData Use

Data Discovery

Data Preservation

Data Publishing

Page 29: Deep Data Citation Mechanism and Service for Scientific ... · Vishwas Chavan Glbl d f llobal Biodiversity In formation Facility (GBIF) Secretariat May 16, 2011. Outlines 1. About

Data Publishing  = gScholarly Publishing!

Email: vchavan@gbif orgEmail: [email protected]