44
GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan and Nicholas King February 12, 2008 [email protected] WWW.GBIF.ORG GBIF efforts in GBIF efforts in digitizing and digitizing and mobilising primary mobilising primary biodiversity data biodiversity data

GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan and Nicholas King February 12, 2008 [email protected] GBIF efforts in digitizing and

Embed Size (px)

Citation preview

GLOBALBIODIVERSITYGLOBALBIODIVERSITY

INFORMATIONFACILITYINFORMATIONFACILITY

Vishwas Chavan and Nicholas King

February 12, 2008

[email protected]

WWW.GBIF.ORG

WWW.GBIF.ORG

GBIF efforts in GBIF efforts in digitizing and digitizing and mobilising primary mobilising primary biodiversity databiodiversity data

GBIF efforts in GBIF efforts in digitizing and digitizing and mobilising primary mobilising primary biodiversity databiodiversity data

GBIF’s MissionGBIF’s MissionGBIF’s MissionGBIF’s Mission

…to make the world’s biodiversity data freely and universally available via the Internet

What is biodiversity?What is biodiversity?What is biodiversity?What is biodiversity?GBIF follows the broadly outlined CBD recognition of levels of biological diversity:

•Molecules / genes•Species•Ecosystems / ecology

Scientists, experts, consultants

Government officials at all levels

Farmers, foresters, indigenous communities

Education at all levels NGOs and the general public

These needs are highly varied, but can be met by open access to the same datasets

The same data can be analysed differently for different uses

Who needs primary biodiversity Who needs primary biodiversity data!data!Who needs primary biodiversity Who needs primary biodiversity data!data!

But this needs easy access to But this needs easy access to (digitised) data(digitised) dataBut this needs easy access to But this needs easy access to (digitised) data(digitised) data

Screen shot: 26 Oct 2007

As of As of end 07 end 07 GBIF GBIF facilitatfacilitates es access access to 142 to 142 million million primary primary data data recordsrecords

GBIF Data Portal: Dispelling Mythes!GBIF Data Portal: Dispelling Mythes!

Searches Taxonomic Geographic (by country or

bounding-box) By dataset

Taxonomic browse navigation using choice of classification

Integration of data: DiGIR-Darwin Core & BioCASe-ABCD (new versions), TAPIR, tab-delimited, TCS, SDD

Search and download by one to many species, geography, dataset (or combination)

Web services

Distributed, Distributed, Decentralised, Data Decentralised, Data Discovery and Discovery and Access through Access through network of network of heterogenous and heterogenous and multicultural multicultural partners is possible!partners is possible!

Countries are organised alphabetically on the lhs,

and show numbers of national records on the

rhs.

Here we can see that there are more than 3.2

million records available for South Africa

(2,8 million with coordinates), referring to nearly 41.500 species

Example of a country summary page. This

map provides an overview of the

density of records currently available.

Sample of records available for South Africa at

September 2007. The GBIF portal offers a range of

options for further use of the data…

It is also possible to get the full list of organisations providing data collected in a specific country or

region

In this case 68 collections from all over the world are making available data for South Africa through GBIF – a good exemplar of data

repatriation activities promoted and facilitated

by GBIF

South African institutions

are also providing data

relevant to other countries

and regions in the world, as

demonstrated in this

example from the Shark

Collection at the Iziko South

African Museum

The GBIF data portal also allows

for more detailed views of

regions, datasets, taxonomic

groups, etc.

Here it is possible to see nearly

100 000 records from the

Linefish dataset collected in

1989 by the Marine and

Coastal Management (MCM)

at the Department of

Environmental Affairs and

Tourism in South Africa

Exporting data from

the GBIF data portal

to other applications

such as Google

Earth is a matter of

a click!

Coverage for AfricaCoverage for AfricaCoverage for AfricaCoverage for Africa

>5m records currently for Africa

> 1m from EU country institutions

Estimated >100m not yet digitised

Within Google Earth overlays it is

also possible to go down to the level

of individual primary records, getting

back to the original data provider

With the filter functionality it is possible to

perform complex queries on the data.

In this example we are looking for all records

on Lepidoptera (butterflies) collected or

observed in South Africa from 1950 to 2000.

Proteaceae in the Cape Floral Kingdom

Range changes due to Climate Change

Distances moved (km)

Average altitude

(m)

Average latitude

(°S)Present 0 88.57 33.21

20% 25.3 113.83 33.4340% 20.0 137.93 33.5960% 17.2 194.85 33.7280% 46.4 269.91 33.98

100% 17.4 296.06 34.09

Leucospermum tomentosum: range centres in 10 year time slices

But, this is just a But, this is just a beginning.......beginning.......But, this is just a But, this is just a beginning.......beginning.......

We need to cover much We need to cover much beyond imagination, and beyond imagination, and much much faster than we much much faster than we think?think?

We need to cover much We need to cover much beyond imagination, and beyond imagination, and much much faster than we much much faster than we think?think?

Biological Data Domain - Biological Data Domain - challengeschallengesBiological Data Domain - Biological Data Domain - challengeschallenges

Persistent digital and physical data stores, moderately accessible

Migration of legacy data, metadata generation, taxonomy (species)

80% ? digitalEcological & Ecosystem Data

Persistent physical data stores, accessible with difficulty

Digitisation, migration of legacy data, indexing

<5% digitalSpecies- & Specimen Data

Persistent digital, universally accessible data stores

Data migration, cleansing, vouchering, taxonomy (gene & species)

95% digitalMolecular Sequence & Gene/Genome Data

Sub-domain Digital Status

Greatest Informatics Problems

Data Status

Primary Biodiversity DataPrimary Biodiversity DataPrimary Biodiversity DataPrimary Biodiversity Data

• Both biodiversity and biodiversity data are unevenly distributed around the world:

Developing WorldDeveloping World

BiodiversityBiodiversity

Biodiversity Biodiversity DataData

Developed WorldDeveloped World

Digital Divide Content Divide Lingual Divide

Knowledge Divide

Emerging catastrophe…………

Primary Biodiversity DataPrimary Biodiversity DataPrimary Biodiversity DataPrimary Biodiversity Data

Observations / Monitoring

Multimedi

a Resource

s

Biological Collections

NNAAMMEESS

NNAAMMEESS

Growth rate of GBIF data Growth rate of GBIF data sharingsharingGrowth rate of GBIF data Growth rate of GBIF data sharingsharing

Growth in Data Sharing Oct 2003 - Oct 2007

0

50

100

150

200

250

Data

Pro

vid

ers

0.0

20.0

40.0

60.0

80.0

100.0

120.0

140.0

160.0

Data

Reco

rds (in

millio

ns)

Providers Records

1 Billion Record by 2008 – We need to 1 Billion Record by 2008 – We need to expedite!expedite!1 Billion Record by 2008 – We need to 1 Billion Record by 2008 – We need to expedite!expedite!

Many specimens remain to have their data digitised

Many records are already digital...

… but are not yet being shared

Goal for Growth in Occurrence Data* by End 2008

0

200

400

600

800

1000

1200

1400

1600

1800

Oct-03

Jan-

04

Jan-

05

Jan-

06

Jan-

07

Oct-07

Feb-0

8

Dec-08

Dat

a P

rovi

der

s

0.0

100.0

200.0

300.0

400.0

500.0

600.0

700.0

800.0

900.0

1,000.0

Data R

ecord

s (in m

illion

s)

Providers Records

* data useful in analyses that contribute to sustainable management of biodiversity* data useful in analyses that contribute to sustainable management of biodiversity

GBIF is all about our shared GBIF is all about our shared vision and partnershipvision and partnershipGBIF is all about our shared GBIF is all about our shared vision and partnershipvision and partnership

28 Voting Country Participants

15 Associate country Participants

35 International Organisations and Economies

GBIF Working PrinciplesGBIF Working PrinciplesGBIF Working PrinciplesGBIF Working Principles

Collaboration and sharing — notnot compilation Ownership of data (specimens oorr names)

remains entirelyentirely with providers

Standardised schemata for data sharing — software free to providers

Worldwide network of collaborating institutions that share data (data providers)

GBIF’s Participants’ Nodes promote and coordinate activities of data providers

GBIF Working PrinciplesGBIF Working PrinciplesGBIF Working PrinciplesGBIF Working Principles

Procedures for interoperability and data integration

Web services (mostly for machines, but for people too)

Global registry for advertisement of shared data

Vision and coordination GBIF has a unique global mandate in both

Informatics and Content GBIF is a multi-purpose, open-ended cyber-

infrastructure that facilitates biologists serving biodiversity and society in new ways

GBIF Strategic Areas 2007 – GBIF Strategic Areas 2007 – 2011 2011 GBIF Strategic Areas 2007 – GBIF Strategic Areas 2007 – 2011 2011

Informatics Data portal powerful and friendly Consolidated infrastructure and standards Tools and support for Nodes and providers

Content Data quantity and richness in priority areas Data integration and discovery Documented data quality

Participation Nodes' expertise shared across the network Guidance on setting up and maintaining

Nodes

• In a database, the data have no actual quality or value; they only have potential value. That value is realized only when someone uses the data to do something useful (English 1999).

•The quality of data cannot be assessed independently of the uses of that data (Strong et al. 1997).

•Data are of high quality if they are fit for their intended use in operations, decision-making, and planning (Juran 1964).

Data: Fitness for UseData: Fitness for Use

Data standards / protocols used Data standards / protocols used by GBIFby GBIFData standards / protocols used Data standards / protocols used by GBIFby GBIF Darwin Core (TDWG data standard)

Simple XML data model to represent taxon occurrence records (only core attributes)

Extensions to handle e.g. curation details, geospatial data, microbial specimens

ABCD - Access to Biological Collection Data (TDWG data standard)

More complex XML data model to represent collection or observation data

Detailed document structure including features for different communities

Taxon Concept Schema (TDWG data standard) XML data model for exchange of nomenclatural/taxonomic data Will be supported in new GBIF data portal

Tab-delimited links to species information Lists of scientific names, URLs and key words Will be supported in order to establish links to external resources

from the new GBIF data portal

DiGIR / BioCASe / TAPIR (TDWG access protocols) XML protocols for searching remote data resources Suitable for use with a wide range of different data models TAPIR (latest version) supports flexible views and simple URLs

SPICE protocol (Species 2000 access protocol) Web service interfaces for exploring taxonomic data (hierarchies,

synonymy, common names) Will be supported for connecting data resources to new GBIF

data portal

LSIDs – Life Science Identifiers (TDWG-adopted GUID mechanism)

Globally unique identifiers to simplify tracking data records Include protocol for resolving data for any LSID

Data standards / protocols used Data standards / protocols used by GBIFby GBIFData standards / protocols used Data standards / protocols used by GBIFby GBIF

Examples of resources provided Examples of resources provided by GBIFby GBIFExamples of resources provided Examples of resources provided by GBIFby GBIF

freefree

GBIF Training Manual 1: GBIF Training Manual 1: Digitisation of Natural History Digitisation of Natural History CollectionsCollections

GBIF Training Manual 1: GBIF Training Manual 1: Digitisation of Natural History Digitisation of Natural History CollectionsCollectionsCONTENTS Introduction The Uses of Primary Species Occurrence Data Initiating a Natural History Collection Digitisation Project Principles of Data Quality Principles and Methods of Data Cleaning BioGeomancer Guide to Best Practices for Georeferencing Guide to Best Practices for Generalizing Glossary and Acronym Expansion

To be released by end February 2007.

Observational Data Task ForceObservational Data Task ForceObservational Data Task ForceObservational Data Task Force

Quantum of observational data is unprecedented Over 60% of GBIF mediated data is observational

Observational Data Task Group• Recommend GBIF on mobilisation of observational data• Criteria for Observational Data Sharing Infrastructure• Metadata Schema for Observational Schema• Protocols / Standards for observational data exchange / sharing• Best Practices Guide for observational data management• Encourage participation of potential data providers

Report by September 2008

Broader range of supported import formats and protocols Occurrence data

Darwin Core (original v1.2, MaNIS, OBIS, new v2.0 with extensions)

ABCD (v1.20, v2.06) Taxonomic data

Catalogue of Life CD-ROM (moving to dynamic checklist) Nomenclators via tab-delimited lists of LSIDs (work under way) Data from ECAT projects (models and tools under way)

Other resources Discussions under way with other resources (GenBank, BOLD,

ARKive) General support for handling XML and tab-delimited formats

Enhanced support for data Enhanced support for data providersproviders

Enhanced support for data Enhanced support for data providersproviders

Validation and annotation of data during indexing Presence of required fields Consistency between country name and

coordinates Reports for data providers

Clear separation between “raw” and “processed” index data Scientific name string versus interpreted taxon Country name string versus interpreted country

“Home page” for each data resource

Enhanced support for data Enhanced support for data providersproviders

Enhanced support for data Enhanced support for data providersproviders

Training, Capacity Building, Training, Capacity Building, MentoringMentoringTraining, Capacity Building, Training, Capacity Building, MentoringMentoring

Training programs on how to share data Training on Ecological Niche Modeling Mentoring to developing countries Help Desk services

Call for Action!Call for Action!Call for Action!Call for Action!

With GBIFs’ decentralised approach of NBIFs, RBIFs, and ThBIFs Africa has lots to contribute.....

Individual, institutional, national, regional and global level!

With GBIFs’ decentralised approach of NBIFs, RBIFs, and ThBIFs Africa has lots to contribute.....

Individual, institutional, national, regional and global level!

How to contact GBIF:How to contact GBIF:How to contact GBIF:How to contact GBIF:

Web site: www.gbif.org Data portal: www.gbif.net

GBIF SecretariatUniversitetsparken 152100 CopenhagenDenmark

E-mail: [email protected]: +45 3532 1470Fax: +45 3532 1480

GBIF Secretariat building, supported by a grant from the Aage V. Jensens Fonde

Merci beau coup / Thank youMerci beau coup / Thank youMerci beau coup / Thank youMerci beau coup / Thank you

Questions?Questions?Questions?Questions?

Questions?Questions?