Upload
borka
View
34
Download
2
Embed Size (px)
DESCRIPTION
Globally Unique Identifiers in Biodiversity Informatics. Kevin Richards Landcare Research NZ TDWG 2008. Introduction. GUID ( G lobally U nique ID entifier) What, Why, Which, How LSIDs Issues. What are GUIDs. G lobally U nique ID entifier - PowerPoint PPT Presentation
Citation preview
Globally Unique Identifiers in
Biodiversity Informatics
Kevin RichardsLandcare Research NZ
TDWG 2008
Introduction
GUID (Globally Unique IDentifier)
– What, Why, Which, How– LSIDs– Issues
What are GUIDs
Globally Unique IDentifier• A short name for a complex entity on the web• Each name identifies only one entity• Examples:
– UUID eg 3E9D6B68-A08C-4F15-BC8A-1265F15D30E2
– DOI eg doi:10.1006/jmbi.1998.2354 – Handle eg hdl:123.456/abc
– LSID eg urn:lsid:indexfungorum.org:names:213645
– PURL eg http://purl.oclc.org/abc/123
What is a GUID
– Properties• Persistent• Opaque • Resolvable, sometimes - useful for locating
information about the entity
Why use GUIDs
Data at Provider 2
BOOK : “Three little pigs” 2 copies
Data Consumer
Data at Provider 1
BOOK : “The three little pigs” 3 copies
BOOKS:“Three little pigs” … (2)“The three little pigs” … (3)
Data at Provider 2 (ID = P2)
BOOK : “Three little pigs”ID (eg ISBN) = A123 2 copies
Data Consumer
Data at Provider 1 (ID = P1)
BOOK : “The three little pigs”ID (eg ISBN) = A1233 copies
BOOKS:ID : A123 : “The three little pigs”… (5)
… but with GUIDs …
BOOK Titles:ID A123 : Provider P1 : “The three little pigs”ID A123 : Provider P2 : “Three little pigs”
Example in our domain
ConsensusId : urn:lsid:compositae.org:names:45240C9B-D419-4B6F-93A5-D0A6DEAB4C81Name : Anthemis gaudium-solis Velen.
Provider Id Taxon Name
IPNI urn:lsid:ipni.org:names:177325-1:1.1 Anthemis gaudium-solis Vel.
Tropicos 50163035 Anthemis goudium-solis Velen.
Euro+Med 133202 Anthemis gaudium-solis Velen.
Govaerts {29FFBEDC-19F5-4899-BCB3-05EE2C7816C8} Anthemis gaudiumsolis Velen.
GUIDs are vital to TDWG architecture
Which GUID
• GUID Subgroup Recommendations:• Use LSIDs for identifying biodiversity data• Reuse GUIDs where they already exist
– GUID type
– Existing assignments
• See GUID Report - http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1
Also Canberra LSID Workshop report:http://www.tdwg.org/fileadmin/subgroups/guid/LSID_policy_workshop_Report_Canberra.pdf
What is an LSID?
• Life Science IDentifier• Developed by The Object Management Group &
W3C• Implemented by the team at IBM• Used for – data objects, datasets, images, files
LSID Format urn:lsid:bioguid.org:taxon:1122:v1
• Prefix - indicates that this is a URN
• URN type - indicates that it’s an LSID-type urn
• Authority - the authority who issued the LSID
• Namespace - internal to that authority
• Object identifier - within that authority
• Version - optional
LSID Rules
• Data doesn’t change (byte identical)
• Always available for resolution– Hand over to another authority if necessary
• At least some basic metadata
Pros of LSIDs
Not tied to physical addresses (as URLs are) Comparison can be done without resolving the ID
– eg for cases like “does object a = object b” Do not require any central registration or central
service Quick to adopt Encourage thought and planning before they are
allocated
Cons of LSIDs
However …
Requires DNS SRV record
Requires specialised software to resolve an LSID (not built in to most software)
The restriction - “LSID data cannot change” can be difficult
How
• What data/objects to apply Ids to
• Decide on – Authority– Namespace– Local ids (new vs existing)
• Issue LSIDs
• Setup resolver
LSID Code
• Current Code Stacks– Open Source (sourceforge.net)– Java, C++, Perl (IBM)– Microsoft .NET (Myself)– TAPIR LSID configuration
LSID Tools
• IBM LSID Launchpad• Firefox LSID Browser• LSID Tester (Rod Page)• Web based resolver – http://lsid.tdwg.org/
http://lsid.tdwg.org/urn:lsid... to get LSID metadata http://lsid.tdwg.org/summary/urn:lsid... to get summary info of LSID object
• Example LSID servers:– Index Fungorum - urn:lsid:indexfungorum.org:names:213649 – IPNI – urn:lsid:ipni.org:names:30000959-2:1.1.2.1– uBio - urn:lsid:ubio.org:namebank:11815
Issues to think about
• Who assigns new LSIDs?
• Who maintains LSID resolvers?
• What to assign LSIDs to:– Physical or Digital– Granularity– Only objects that need to be resolved /
identified externally– Is there any data, or only metadata?
Issues to think about
• When to resolve LSIDs– Every time an LSID is encountered, or only
when a client requests it?
• TDWG standards for metadata– Which ones?– Consistent application
References• LSID Source Forge - http://lsids.sourceforge.net/
• LSID .NET Source Forge - http://sourceforge.net/projects/lsid-dotnet
• LSID Tutorial - http://www-128.ibm.com/developerworks/opensource/library/os-lsid/
• LSID Specification - http://www.omg.org/cgi-bin/doc?dtc/04-05-01
• LSID Tester - http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/
• LSID Launchpad - http://www-124.ibm.com/developerworks/downloads/detail.php?group_id=124&what=rele&id=553
• GUID Subgroup - http://www.tdwg.org/activities/guid/
• GUID Subgroup Reports
– http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1
– http://wiki.tdwg.org/twiki/pub/TIP/TipDocuments/GUID1Report.pdf
• Firefox LSID developer site - http://lsid.mozdev.org/