Upload
dag-endresen
View
430
Download
0
Embed Size (px)
DESCRIPTION
Presentation of the Global Biodiversity Information Facility (GBIF) and GBIF Norway for the Department of Technical and Scientific Conservation (CONSERV) at the Natural History Museum, University of Oslo. Tøyen, Oslo, 7 November 2012.
Citation preview
Seksjonsmøte: Seksjon for konservering og forskningsteknikk (CONSERV)
Global Biodiversity Information Facility GBIF Norway
Dag Endresen and Christian Svindseth GBIF Norway, NHM-UiO Natural History Museum, University of Oslo (NHM-UiO) Global Biodiversity Information Facility (GBIF) 7 November 2012
Topics
• What is GBIF? • GBIF data portal • Darwin Core (DwC), DwC archive • Persistent identifiers (UUID) • Data paper, citation of data sets
2
GBIF enables free and open access to biodiversity data online. We are an international government-initiated and funded initiative focused on making biodiversity data available to all and anyone, for scientific research, conservation and sustainable development.
Status data portal October 2012
3
OECD Global Science Forum recommenda8on (1999): “[E]stablish and support a distributed system of interlinked and interoperable modules (databases, so7ware and networking tools, search engines, analy;cal algorithms, etc.) that together will form a Global Biodiversity Informa;on Facility (GBIF)”.
1. Information infrastructure – an
Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data.
2. Community-developed tools,
standards and protocols – the tools data providers need to format and share their data.
3. Capacity-building and training –
and access to a global expert community.
5
http://data.gbif.org/
GBIF portal: 16,064,074 records with coordinates from a total of 17,268,452 records. GBIF Norway: 11,777,738 records are provided FROM Norwegian data publishers.
GBIF portal: 16,064,074 records with coordinates from a total of 17,268,452 records. GBIF Norway: 11,777,738 records are provided FROM Norwegian data publishers.
GBIF contributes species occurrence data to “Artskart”.
9
GBIF’s unique role • Registry of biodiversity data resources • Tools and support for biodiversity data publica8on • Network development at na8onal, regional and
global levels • Global virtual natural history collec8on • Cross-‐domain linkage between data from
collec8ons, ecology and genomics • Access to biodiversity data for GIS analysis and
environmental monitoring – Aggregated presence data – Site-‐based survey data (samples, presence/absence)
Slide developed by Donald Hobern, 2012
10
Improving fitness-‐for-‐use Aggregate
Data Indexes
Data Quality
Expert Cura6on
• Progressive improvement – Data indexes
• Centralised discovery • Standardisa8on of persistent iden8fiers • Consistent metadata
– Data quality • Inconsistencies within records • Valida8on against metadata • Outlier detec8on • Metrics per record and per data set
– Expert cura8on • Interface with taxon expert groups • Incorporate findings of data users • Need efficient researcher-‐friendly tools
Slide developed by Donald Hobern, 2012
Organisa8onal partnerships • Some poten8al data collabora8ons
– GBIF-‐mediated occurrence data • Maps, lists of countries recorded • Localise content in EOL, etc.
– BHL literature • User annota8ons to extract occurrence records • Link original (and other) descrip8ons to taxonomy
– EOL species informa8on • Support EOL as global species informa8on aggregator • Include EOL summary box on each GBIF species page
– Catalogue of Life • IPT to publish global and regional species databases • GBIF infrastructure to support construc8on of CoL
Slide developed by Donald Hobern, 2012
12
Unifying species data
Integrated access for records of the occurrence of any species: • What? • When? • Where? • What evidence? • Data owner? • Link to full record
Presence only
Collec6ons
Ecological Monitoring Genomics
Darwin Core
Slide developed by Donald Hobern
Slide developed by Donald Hobern, 2012
Unifying species data
Integrated access for records of the occurrence of any species: • What? • When? • Where? • What evidence? • Data owner? • Link to full record
Presence only
Collec6ons
Ecological Monitoring Genomics
Darwin Core
Fully compatible with existing Darwin Core data, plus:
• Which species were recorded together?
• Which sets of data are directly comparable?
• Which species were most abundant in each sample?
Presence/absence
Darwin Core + Core Survey
Fields
Sample Id Method Id
Rela8ve abundance ...
Slide developed by Donald Hobern, 2012
Darwin Core – a vocabulary of terms
Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715
http://rs.tdwg.org/dwc/terms/
Seman8c MediaWiki
a forum for
discussion and development of terminology.
http://terms.gbif.org/
17
Darwin Core Archive (DwC-A) v DwC-A publish DwC records including terms
from DwC-A extensions. v Simple text based format. v Zipped single file archive.
Germplasm.txt
18
Darwin Core Archive extensions
19
• Global Names Architecture (GNA) • Audubon Core (multimedia) • Invasive species (GISIN) • Genetic Resources (Germplasm) • EOL species profile • Taxonomic Concept Schema (TCS) • Genomics Standards Consortium (GSC) • Meta-genomics (?) • ABCD (?) • …
• Country codes • Language • Basis of record • Taxonomic rank • Nomenclatural status • Life form • Life stage • Geological time periods
• chronostratigraphy • magnetostratigraphy
• Species interactions • saproxylic interactions • pollinators
• …
Controlled value vocabularies
20
• Persistent identifiers (UUID, QR code) • Data set metadata descriptions (data paper) • Data rescue, scientific reports and student work • Continue digitization efforts • Biodiversity literature (BHL)
21
• Persistent Iden8fier (PID) • Globally Unique Iden8fier (GUID) • Universal Resource Iden8fier (URI) • Persistent Uniform Resource Locator (PURL) • Digital Object Iden8fier (DOI) • Handle system (Handle) • Life Science Iden8fier (LSID) • Archival Resource Key (ARK) • Universally Unique Iden6fier (UUID)
22
• Scalability, number of IDs • Community acceptance • Long-‐term life-‐cycle • Resolvable, resolu8on service(s) • Cost per iden8fier • People-‐friendly or machine-‐friendly • Genera8on of IDs
– Central genera8on, PID issuer – Distributed genera6on at source
23
• A UUID is a 16-‐octet (128-‐bit) number. • Example:
C37E3F9B-‐BCAF-‐4479-‐8EB7-‐3346A2DB2373 • The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.
• Allows for easy genera6on at source in a distributed network.
24
• Quick Response Code (QR code). • A type of matrix barcode (or two-‐dimensional code).
• Popular due to its fast readability and large storage capacity.
• The use of QR Codes is free of any license. • The QR Code is clearly defined and published as an ISO standard.
• Invented in Japan by the Toyota subsidiary Denso Wave in 1994.
25
QR code for all museum objects at NHM-UiO would provide: • Machine-readable using an
ordinary smart phone (or PDA). • Allows for new and efficient
workflows for collection management.
• Deployment for stable identifiers appropriate for data-basing.
UUID: C37E3F9B-BCAF-4479-8EB7-3346A2DB2373
26
• Peer review op8on for biodiversity data. • Authors get credit for data publica8on. • Mee8ng concerns over data quality. • Mee8ng concerns over data cita6on mechanism. • Metadata formats: Ecological Metadata Language
(EML), Dublin Core, Darwin Core, Natural Collec8ons Descrip8ons (NCD)…
• Towards à Each data set published through GBIF accompanied by a data paper…?
27
Data rescue activity: Many species occurrence data are “hidden” in reports and documents produced by universities, research institutes, public agencies and the university museums. Collaboration project with Artsdatabanken
Photo by: Niklas Bildhauer
270 years of literature - since Carl Linnaeus and his Systema Naturae (1735) And a potential source of biodiversity data
Biodiversity Heritage Library a consortium of natural history and botanical libraries http://www.biodiversitylibrary.org/
à BHL Norway…?
30
Photo by: Dvortygirl
A book scanner at the Internet Archive headquarters in San Francisco, California
The Millennium Ecosystem Assessment showed that human actions often lead to irreversible losses in the diversity of life, and these losses have been more rapid in the past 50 years than ever before in human history. Biological diversity is key to resilience – the ability of natural and social systems to adapt to change, and is essential for nearly every aspect of human well-being. Because human threats to biodiversity occur across large spatial and temporal scales, biodiversity and ecosystem monitoring, forecasting, and risk assessments require data to be organised in a globally-accessible, integrated infrastructure. GBIF’s Data Portal provides this infrastructure.
32
33
Furthermore, I think that we need persistent identifiers!
Cato the Elder ended all his speeches in the senate of Rome with: "Ceterum
autem censeo Carthaginem esse delendam" (English: "Furthermore, I think Carthage must be destroyed").
34