34
Seksjonsmøte: Seksjon for konservering og forskningsteknikk (CONSERV) Global Biodiversity Information Facility GBIF Norway Dag Endresen and Christian Svindseth GBIF Norway, NHM-UiO Natural History Museum, University of Oslo (NHM-UiO) Global Biodiversity Information Facility (GBIF) 7 November 2012

Global Biodiversity Information Facility (GBIF) - 2012

Embed Size (px)

DESCRIPTION

Presentation of the Global Biodiversity Information Facility (GBIF) and GBIF Norway for the Department of Technical and Scientific Conservation (CONSERV) at the Natural History Museum, University of Oslo. Tøyen, Oslo, 7 November 2012.

Citation preview

Page 1: Global Biodiversity Information Facility (GBIF) - 2012

   

Seksjonsmøte: Seksjon for konservering og forskningsteknikk (CONSERV)

Global Biodiversity Information Facility GBIF Norway

Dag Endresen and Christian Svindseth GBIF Norway, NHM-UiO Natural History Museum, University of Oslo (NHM-UiO) Global Biodiversity Information Facility (GBIF) 7 November 2012

Page 2: Global Biodiversity Information Facility (GBIF) - 2012

Topics  

•  What is GBIF? •  GBIF data portal •  Darwin Core (DwC), DwC archive •  Persistent identifiers (UUID) •  Data paper, citation of data sets

2

Page 3: Global Biodiversity Information Facility (GBIF) - 2012

GBIF enables free and open access to biodiversity data online. We are an international government-initiated and funded initiative focused on making biodiversity data available to all and anyone, for scientific research, conservation and sustainable development.

Status data portal October 2012

3

Page 4: Global Biodiversity Information Facility (GBIF) - 2012

OECD  Global  Science  Forum  recommenda8on  (1999):  “[E]stablish  and  support  a  distributed  system  of  interlinked  and  interoperable  modules  (databases,  so7ware  and  networking  tools,  search  engines,  analy;cal  algorithms,  etc.)  that  together  will  form  a  Global  Biodiversity  Informa;on  Facility  (GBIF)”.  

Page 5: Global Biodiversity Information Facility (GBIF) - 2012

1.  Information infrastructure – an

Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data.

2.  Community-developed tools,

standards and protocols – the tools data providers need to format and share their data.

3.  Capacity-building and training –

and access to a global expert community.

5

Page 6: Global Biodiversity Information Facility (GBIF) - 2012

   

http://data.gbif.org/

Page 7: Global Biodiversity Information Facility (GBIF) - 2012

GBIF portal: 16,064,074 records with coordinates from a total of 17,268,452 records. GBIF Norway: 11,777,738 records are provided FROM Norwegian data publishers.

Page 8: Global Biodiversity Information Facility (GBIF) - 2012

GBIF portal: 16,064,074 records with coordinates from a total of 17,268,452 records. GBIF Norway: 11,777,738 records are provided FROM Norwegian data publishers.

Page 9: Global Biodiversity Information Facility (GBIF) - 2012

GBIF contributes species occurrence data to “Artskart”.

9

Page 10: Global Biodiversity Information Facility (GBIF) - 2012

GBIF’s  unique  role  •  Registry  of  biodiversity  data  resources  •  Tools  and  support  for  biodiversity  data  publica8on  •  Network  development  at  na8onal,  regional  and  

global  levels  •  Global  virtual  natural  history  collec8on  •  Cross-­‐domain  linkage  between  data  from  

collec8ons,  ecology  and  genomics  •  Access  to  biodiversity  data  for  GIS  analysis  and  

environmental  monitoring  –  Aggregated  presence  data  –  Site-­‐based  survey  data  (samples,  presence/absence)  

Slide  developed  by  Donald  Hobern,  2012

10

Page 11: Global Biodiversity Information Facility (GBIF) - 2012

Improving  fitness-­‐for-­‐use  Aggregate  

Data  Indexes  

Data  Quality  

Expert  Cura6on  

•  Progressive  improvement  –  Data  indexes  

•  Centralised  discovery  •  Standardisa8on  of  persistent  iden8fiers  •  Consistent  metadata  

–  Data  quality  •  Inconsistencies  within  records  •  Valida8on  against  metadata  •  Outlier  detec8on  •  Metrics  per  record  and  per  data  set  

–  Expert  cura8on  •  Interface  with  taxon  expert  groups  •  Incorporate  findings  of  data  users  •  Need  efficient  researcher-­‐friendly  tools  

Slide  developed  by  Donald  Hobern,  2012

Page 12: Global Biodiversity Information Facility (GBIF) - 2012

Organisa8onal  partnerships  •  Some  poten8al  data  collabora8ons  

– GBIF-­‐mediated  occurrence  data  •  Maps,  lists  of  countries  recorded  •  Localise  content  in  EOL,  etc.  

– BHL  literature  •  User  annota8ons  to  extract  occurrence  records  •  Link  original  (and  other)  descrip8ons  to  taxonomy  

– EOL  species  informa8on  •  Support  EOL  as  global  species  informa8on  aggregator  •  Include  EOL  summary  box  on  each  GBIF  species  page  

– Catalogue  of  Life  •  IPT  to  publish  global  and  regional  species  databases  •  GBIF  infrastructure  to  support  construc8on  of  CoL  

Slide  developed  by  Donald  Hobern,  2012

12

Page 13: Global Biodiversity Information Facility (GBIF) - 2012

Unifying  species  data  

Integrated access for records of the occurrence of any species: •  What? •  When? •  Where? •  What evidence? •  Data owner? •  Link to full record

Presence only

Collec6ons  

Ecological  Monitoring   Genomics  

Darwin  Core  

Slide  developed  by  Donald  Hobern

Slide  developed  by  Donald  Hobern,  2012

Page 14: Global Biodiversity Information Facility (GBIF) - 2012

Unifying  species  data  

Integrated access for records of the occurrence of any species: •  What? •  When? •  Where? •  What evidence? •  Data owner? •  Link to full record

Presence only

Collec6ons  

Ecological  Monitoring   Genomics  

Darwin  Core  

Fully compatible with existing Darwin Core data, plus:

•  Which species were recorded together?

•  Which sets of data are directly comparable?

•  Which species were most abundant in each sample?

Presence/absence

Darwin  Core  +  Core  Survey  

Fields    

Sample  Id  Method  Id  

Rela8ve  abundance  ...  

Slide  developed  by  Donald  Hobern,  2012

Page 15: Global Biodiversity Information Facility (GBIF) - 2012

Darwin Core – a vocabulary of terms

Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715

Page 16: Global Biodiversity Information Facility (GBIF) - 2012

http://rs.tdwg.org/dwc/terms/

Page 17: Global Biodiversity Information Facility (GBIF) - 2012

Seman8c  MediaWiki  

     a  forum  for  

discussion  and  development  of  terminology.  

http://terms.gbif.org/

17

Page 18: Global Biodiversity Information Facility (GBIF) - 2012

Darwin Core Archive (DwC-A) v  DwC-A publish DwC records including terms

from DwC-A extensions. v  Simple text based format. v  Zipped single file archive.

Germplasm.txt

18

Page 19: Global Biodiversity Information Facility (GBIF) - 2012

Darwin Core Archive extensions

19

•  Global Names Architecture (GNA) •  Audubon Core (multimedia) •  Invasive species (GISIN) •  Genetic Resources (Germplasm) •  EOL species profile •  Taxonomic Concept Schema (TCS) •  Genomics Standards Consortium (GSC) •  Meta-genomics (?) •  ABCD (?) •  …

Page 20: Global Biodiversity Information Facility (GBIF) - 2012

•  Country codes •  Language •  Basis of record •  Taxonomic rank •  Nomenclatural status •  Life form •  Life stage •  Geological time periods

•  chronostratigraphy •  magnetostratigraphy

•  Species interactions •  saproxylic interactions •  pollinators

•  …

Controlled value vocabularies

20

Page 21: Global Biodiversity Information Facility (GBIF) - 2012

•  Persistent identifiers (UUID, QR code) •  Data set metadata descriptions (data paper) •  Data rescue, scientific reports and student work •  Continue digitization efforts •  Biodiversity literature (BHL)

21

Page 22: Global Biodiversity Information Facility (GBIF) - 2012

•  Persistent  Iden8fier  (PID)  •  Globally  Unique  Iden8fier  (GUID)  •  Universal  Resource  Iden8fier  (URI)  •  Persistent  Uniform  Resource  Locator  (PURL)  •  Digital  Object  Iden8fier  (DOI)  •  Handle  system  (Handle)  •  Life  Science  Iden8fier  (LSID)  •  Archival  Resource  Key  (ARK)  •  Universally  Unique  Iden6fier  (UUID)  

22

Page 23: Global Biodiversity Information Facility (GBIF) - 2012

•  Scalability,  number  of  IDs  •  Community  acceptance  •  Long-­‐term  life-­‐cycle  •  Resolvable,  resolu8on  service(s)  •  Cost  per  iden8fier  •  People-­‐friendly  or  machine-­‐friendly  •  Genera8on  of  IDs  

– Central  genera8on,  PID  issuer    – Distributed  genera6on  at  source  

23

Page 24: Global Biodiversity Information Facility (GBIF) - 2012

•  A  UUID  is  a  16-­‐octet  (128-­‐bit)  number.  •  Example:  

C37E3F9B-­‐BCAF-­‐4479-­‐8EB7-­‐3346A2DB2373  •  The  probability  of  one  duplicate  would  be  about  50%  if  every  person  on  earth  owns  600  million  UUIDs.  

•  Allows  for  easy  genera6on  at  source  in  a  distributed  network.  

24

Page 25: Global Biodiversity Information Facility (GBIF) - 2012

•  Quick  Response  Code  (QR  code).  •  A  type  of  matrix  barcode  (or  two-­‐dimensional  code).  

•  Popular  due  to  its  fast  readability  and  large  storage  capacity.  

•  The  use  of  QR  Codes  is  free  of  any  license.  •  The  QR  Code  is  clearly  defined  and  published  as  an  ISO  standard.  

•  Invented  in  Japan  by  the  Toyota  subsidiary  Denso  Wave  in  1994.  

25

Page 26: Global Biodiversity Information Facility (GBIF) - 2012

QR code for all museum objects at NHM-UiO would provide: •  Machine-readable using an

ordinary smart phone (or PDA). •  Allows for new and efficient

workflows for collection management.

•  Deployment for stable identifiers appropriate for data-basing.

UUID: C37E3F9B-BCAF-4479-8EB7-3346A2DB2373

26

Page 27: Global Biodiversity Information Facility (GBIF) - 2012

•  Peer  review  op8on  for  biodiversity  data.  •  Authors  get  credit  for  data  publica8on.  •  Mee8ng  concerns  over  data  quality.  •  Mee8ng  concerns  over  data  cita6on  mechanism.  •  Metadata  formats:  Ecological  Metadata  Language  

(EML),  Dublin  Core,  Darwin  Core,  Natural  Collec8ons  Descrip8ons  (NCD)…  

•  Towards  à  Each  data  set  published  through  GBIF  accompanied  by  a  data  paper…?  

27

Page 28: Global Biodiversity Information Facility (GBIF) - 2012
Page 29: Global Biodiversity Information Facility (GBIF) - 2012

Data rescue activity: Many species occurrence data are “hidden” in reports and documents produced by universities, research institutes, public agencies and the university museums. Collaboration project with Artsdatabanken

Photo by: Niklas Bildhauer

Page 30: Global Biodiversity Information Facility (GBIF) - 2012

270 years of literature - since Carl Linnaeus and his Systema Naturae (1735) And a potential source of biodiversity data

Biodiversity Heritage Library a consortium of natural history and botanical libraries http://www.biodiversitylibrary.org/

à BHL Norway…?

30

Page 31: Global Biodiversity Information Facility (GBIF) - 2012

Photo by: Dvortygirl

A  book  scanner  at  the  Internet  Archive  headquarters  in  San  Francisco,  California

Page 32: Global Biodiversity Information Facility (GBIF) - 2012

   

The Millennium Ecosystem Assessment showed that human actions often lead to irreversible losses in the diversity of life, and these losses have been more rapid in the past 50 years than ever before in human history. Biological diversity is key to resilience – the ability of natural and social systems to adapt to change, and is essential for nearly every aspect of human well-being. Because human threats to biodiversity occur across large spatial and temporal scales, biodiversity and ecosystem monitoring, forecasting, and risk assessments require data to be organised in a globally-accessible, integrated infrastructure. GBIF’s Data Portal provides this infrastructure.

32

Page 33: Global Biodiversity Information Facility (GBIF) - 2012

33

Page 34: Global Biodiversity Information Facility (GBIF) - 2012

Furthermore, I think that we need persistent identifiers!

Cato the Elder ended all his speeches in the senate of Rome with: "Ceterum

autem censeo Carthaginem esse delendam" (English: "Furthermore, I think Carthage must be destroyed").

34