Upload
dag-endresen
View
662
Download
3
Embed Size (px)
DESCRIPTION
Presentation of the Global Biodiversity Information Facility (GBIF), GBIF-Norway and the Norwegian Biodiversity Information Centre (NBIC, Artsdatabanken) at the Norwegian Institute for Forestry and Landscape (Skog og Landskap) at Ås outside Oslo on the 17th October 2013. Seminar together with the Norwegian Biodiversity Information Centre (NBIC, Artsdatabanken).
Citation preview
Seminar at the Norwegian Forest and Landscape Institute
Global Biodiversity Information Facility (GBIF)
A global infrastructure for publishing biodiversity data
Dag Endresen and Christian Svindseth GBIF Norway, Natural History Museum of the University in Oslo (NHM-UiO) Global Biodiversity Information Facility (GBIF) 17. October 2013
Topics • What is GBIF? • International partners • Darwin Core terminology • GBIF data portal and services • Norwegian collection portals • Persistent identifiers (PID) • Data paper
2
Status GBIF data-portal
Oktober 2013
GBIF enables free and open access to biodiversity data online.
We are an international government-initiated and funded initiative focused on making biodiversity data available to all and anyone, for scientific research, conservation and sustainable development.
3
GBIF’s unique role • Registry of biodiversity data resources. • Tools and support for biodiversity data publica?on. • Network development at na?onal, regional and
global levels. • Global virtual natural history collec?on. • Cross-‐domain linkage between data from
collec?ons, ecology and genomics. • Access to global biodiversity data for GIS analysis
and environmental monitoring. – Aggregated presence data – Site-‐based survey data (samples, presence/absence)
Slide by Donald Hobern, 2012
4
Norway joined GBIF in February 2004.
The low membership coverage in Africa and Asia is an important gap! 5
OECD Global Science Forum (1999): “establish and support a distributed system of interlinked and interoperable modules (databases, so6ware and networking tools, search engines, analy:cal algorithms, etc.) that together will form a Global Biodiversity Informa:on Facility (GBIF)”.
6
The Millennium Ecosystem Assessment showed that human actions often lead to irreversible losses in the diversity of life, and these losses have been more rapid in the past 50 years than ever before in human history. Biological diversity is key to resilience – the ability of natural and social systems to adapt to change, and is essential for nearly every aspect of human well-being. Because human threats to biodiversity occur across large spatial and temporal scales, biodiversity and ecosystem monitoring, forecasting, and risk assessments require data to be organised in a globally-accessible, integrated infrastructure. GBIF’s Data Portal provides this infrastructure.
7
Organisa?onal partnerships
• Some poten?al data collabora?ons – Taxon names and nomenclature
• Catalog of Life (CoL) • IPT to publish global and regional species databases • GBIF infrastructure to support construc?on of CoL
– Biodiversity literature • Biodiversity Heritage Library (BHL) • User annota?ons to extract occurrence records • Link original (and other) descrip?ons to taxonomy
– Species informa?on and traits • Encyclopedia of Life (EoL) • Support EOL as global species informa?on aggregator • Include EOL summary box on each GBIF species page
Based on slide by Donald Hobern, 2012
8
GBIF and GEO Intergovernmental group on earth observations
Data Integration & Interoperability
GBIF provides the infrastructure delivering species occurrence data.
GEO BON Biodiversity observation network
9
GIASIP Global Invasive Alien Species Information Partnership GBIF provides the infrastructure delivering species occurrence data.
Launched at CBD COP11 October 2012 in Hyderabad, India.
10
GBIF and IPBES (Naturpanelet) Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES)
IPBES provides informa?on to support policy decisions and scien?fic research on biodiversity. GBIF operate within data, informa?on and knowledge domain of biodiversity informa?cs. GBIF GBIF provides the infrastructure delivering species occurrence data in IPBES.
Science
Policy
Biodiversity
Data, informa?on and knowledge
IPBES GBIF 11
1. Information infrastructure –
an Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data.
2. Community-developed tools,
standards and protocols – the tools data providers need to format and share their data.
3. Capacity-building and training
– and access to a global expert community.
12
Common discovery system http://gbrds.gbif.org
Based on slide by David Remsen, GBIF, January 2012
gbrds.gbif.org www.gbif.org
13
Architecture • Global Registry for resource discovery. • Common and documented data standards. – Metadata – Data – Vocabularies
• Data Sharing tools. • Common web service methods. • Resolvable iden?fiers.
Slide by David Remsen, GBIF, November 2011
14
Darwin Core – a vocabulary of terms
Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. (doi:10.1371/journal.pone.0029715) 15
http://rs.tdwg.org/terms/
Unifying species data
Integrated access for records of the occurrence of any species: • What? • When? • Where? • What evidence? • Data owner? • Link to full record
Presence only
Collec/ons
Ecological Monitoring Genomics
Darwin Core
Slide by Donald Hobern, 2012
17
Unifying species data
Integrated access for records of the occurrence of any species: • What? • When? • Where? • What evidence? • Data owner? • Link to full record
Presence only
Collec/ons
Ecological Monitoring Genomics
Darwin Core
Fully compatible with existing Darwin Core data, plus:
• Which species were recorded together?
• Which sets of data are directly comparable?
• Which species were most abundant in each sample?
Presence/absence
Darwin Core + Core Survey
Fields
Sample Id Method Id
Rela?ve abundance ...
Slide by Donald Hobern, 2012
18
Darwin Core Archive (DwC-A) v DwC-A publish DwC records including terms
from DwC-A extensions. v Simple text based format. v Zipped single file archive.
Germplasm.txt
19
Darwin Core Archive Assistant (GBIF, 2010) The Darwin Core Archive Assistant is a web application that presents a simple interface for describing the data elements a data publisher wishes to serve to the GBIF network as basic text files and composes the appropriate XML descriptor file as defined in the Darwin Core Text Guidelines to accompany them. It communicates with the GBIF registry to provide an up-to-date listing of all relevant Darwin Core terms and available extensions and presents these in a simple checklist format.
http://tools.gbif.org/dwca-assistant/ 20
http://tools.gbif.org/spreadsheet-processor/
Fitness for use Defini?on
"The general intent of describing the quality of a par:cular dataset or record is to describe the fitness of that dataset or record for a par:cular use that one may have in mind for the data."
Chrisman, 1991
Slide by Laura Russell, VertNet, September 2011
22
Improving fitness-‐for-‐use Aggregate
Data Indexes
Data Quality
Expert Cura/on
• Progressive improvement – Data indexes
• Centralised discovery • Standardisa?on of persistent iden?fiers • Consistent metadata
– Data quality • Inconsistencies within records • Valida?on against metadata • Outlier detec?on • Metrics per record and per data set
– Expert cura?on • Interface with taxon expert groups • Incorporate findings of data users • Need efficient researcher-‐friendly tools
Slide by Donald Hobern, 2012
23
Taxonomic data Names are oeen the first point of entry to biodiversity databases. => Risk of error propaga?on Possible errors: • Wrong iden?fica?on • Wrong format • Spelling errors
Slide by Laura Russell, VertNet, September 2011
24
The problem with scientific names
• No comprehensive catalog of species • Names ≠ species • The species problem – species concepts • Compe?ng classifica?ons / phylogenies • Many names for one taxon • One name for many taxa • ‘Names’ are more than code-‐compliant scien?fic names
Slide by David Shorthouse, Canadensys, January 2013
25
Proposed solution • Inclusive
– Accommodate alternate perspec?ves
• Reconcilia?on – Map names among and between each other
• Disambigua?on – Context to assign homonymic names to righmul place
Slide by David Shorthouse, Canadensys, January 2013
26
Indexed by GBIF 3 May 2013
Indexed by GBIF 14 January 2013
Improvingdata quality
The fish collection at NHM has some longitude latitude columns swapped… Noticed and corrected in April 2013.
(dataset 8102)
27
http://www.gbif.org/
New portal launched 9 October 2013
28
Data published through GBIF
Last updated: 2013-‐10-‐02
A modest decline in the total number of data records in January 2013 resulted from deletion of duplicates and withdrawn data, identified through software and processing upgrades.
80
100
120
140
160
180
200
220
240
260
280
300
320
340
360
380
400
420
440
Prim
ary
biod
iver
sity
rec
ords
(m
illio
ns)
29
GBIF data publishers
Last updated: 2013-‐10-‐02
A sharp rise in the number of data publishers in September 2013 results from institutions choosing to register as separate entities rather than sharing datasets through a single publisher at their national node institution. This helps to raise the visibility and branding of the institutions, and provides more accurate attribution, especially in the new GBIF portal coming online shortly.
200
220
240
260
280
300
320
340
360
380
400
420
440
460
480
500
520
540
560
580
Num
ber
of in
stit
utio
ns r
egis
tere
d as
GBI
F da
ta p
ublis
hers
30
GBIF citation in research
Last updated: 2013-‐10-‐2013
57
43
61 66
90
64
17
35
48
66 63
25
52
89
148
170
232
197
0
50
100
150
200
250
2008 2009 2010 2011 2012 2013 (Jan-‐Sep)
No. of p
eer-‐review
ed pub
lica?
ons
GBIF men?oned
GBIF discussed
GBIF-‐mediated data used
31
GBIF portal:
13,3 million occurrences are located in Norway. Published from 30 countries worldwide.
GBIF portal:
12,5 million occurrences published form Norwegian institutes. Covering 180 countries worldwide.
Danmark Finland
Norway Sweden
Iceland
Oct 2013 Data set Occurences Denmark 45 9 311 741 Finland 57 14 666 474 Iceland 4 458 705 Norway 85 12 531 207 Sweden 47 43 374 550
Status Nordic GBIF data sets (data hosted by…)
34
“Artskart” provides the national “GBIF”
portal to species occurrences and
specimens in Norway. 35
The site at http://gbif.no provides an
overview of the Norwegian
data sets published to
GBIF.
36
• Custom data portals for Norwegian collections. • Upgrade to Darwin Core archives across Norway. • Persistent identifiers (UUID, QR code). • Data set metadata descriptions (data paper). • GIS data server for spatial environment data.
37
Custom collec?on portals
38
• Soeware from GBIF to implement online data portals for biodiversity data.
– Na?onal, thema?c or regional.
– Based on data published using GBIF standards.
39
Different data portals will implement very different modules and
func?onality to meet their own needs.
Slide by David Remsen (2011)
40
Artskart
UiT
UiB
S&L
Darwin Core Archive
Collec?ons and data sets published from the data owner as one single Darwin Core archive (DwC-‐A). Different data types from the same DwC-‐A can be included to different data portals.
GBIF Portal
Opportunities with Darwin Core:
Data portal for institute, region, or theme?
41
The purpose of identifiers …is to name things,
making it possible to refer to them.
What is an identifier: “Each identifier refers to one and only one thing” (Coyle 2006). “An association between a string and a thing” (Kunze 2003). “A stated association between a symbol and a thing; that the symbol may be used to unambiguously refer to the thing within a given context” (Campbell 2007).
43
UUID QR codes for all museum objects at NHM-UiO would provide: • Machine-readable using an
ordinary smart phone (or PDA). • Allows for new and efficient
workflows for collection management.
• Deployment for stable identifiers appropriate for data-basing.
44
Catalog number: O-L-000014, http://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3
45
http://purl.org/nhmuio/id/d91e8253-0ac1-4681-ac69-e50070af86a2
46
47 47
48 48
• Peer review op?on for biodiversity data. • Authors get scien?fic credit for data publica?on. • Mee?ng concerns over data quality. • Mee?ng concerns over data cita/on mechanism. • Metadata formats: Ecological Metadata Language
(EML), Dublin Core, Darwin Core, Natural Collec?ons Descrip?ons (NCD)…
• Towards à Each data set published through GBIF accompanied by a data paper…?
49
50
Why publish your data
• Citable publica?on • Establish scien?fic priority • Increase collabora?on • Link data to bigger network • Re-‐use and mul?ply effect • Respond to funding requirements
hqp://biodiversitydatajournal.com/
Smith V, Georgiev T, Stoev P, Biserkov J, Miller J, Livermore L, Baker E, Mietchen D, Couvreur T, Mueller G, Dikow T, Helgen K, Frank J, Agosti D, Roberts D, Penev L (2013) Beyond dead trees: integrating the scientific process in the Biodiversity Data Journal. Biodiversity Data Journal 1: e995. DOI: 10.3897/BDJ.1.e995
Data rescue activity: Many species occurrence data are “hidden” in reports and documents produced by universities, research institutes, public agencies and the university museums. Project with Artsdatabanken
Photo by: Niklas Bildhauer
Scien?sts from Norwegian ins?tutes using
GBIF-‐mediated data:
Sections (Moen 1999)
PCA Component 1
Zones (Moen 1999)
PCA component 2
Norwegian Vegetation Atlas (Moen 1999) PCA analysis of 54 environmental variables across
Norway versus the National Vegetation Atlas.
“PCA Norway”
Bakkestuen, V., Erikstad, L., and Økland, R.H. (2008). Step-less models for regional environmental variation in Norway. J. Biogeography 35: 1906-1922.
Based on a slide by Vegar Bakkestuen 55
Modeling Norwegian fungi • 83 fungi species. • 10.500 occurrences
from the GBIF portal. • Predic?ve modeling
of species distribu?on.
Wollan, A. K., Bakkestuen, V., Kauserud, H., Gulden., G and Halvorsen, R. 2008. Modelling and predic?ng fungal distribu?on paqerns using herbarium data. J. Biogeography 35:2298-‐2310. Slide by Vegar Bakkestuen
Amanita phalloides Catathelasma imperiale
Hygrocybe vitellina Marasmius_siccus 56
Node Personnel Dag Endresen, Node Manager Christian Svindseth, Database manager Fridtjof Mehlum, Research Director Einar Timdal, Associate Professor Vegar Bakkestuen, Researcher Geir Søli, Associate Professor Nils Valland, Artsdatabanken Wouter Koch, Artsdatabanken
57