31
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme Officer for NODES GBIF Secretariat Photo by Scyza, stock.xchng Based in a presentation by F. Pando

II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Embed Size (px)

Citation preview

Page 1: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

II Course on GBIF Node ManagementArusha, Tanzania

31st October and 1st November 2008

Biodiversity andBiodiversity Informatics

Juan C.BELLOSenior Programme Officer for NODES

GBIF Secretariat

Photo by Scyza,

stock.xchng

Based in a presentation by F. Pando

Page 2: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

SUMMARY

1. Introduction: about this speech.

2. Definition of Biodiversity

3. Biodiversity Information and its accesibility

a) Primary biodiversity data.

b) Profiles, standards and protocols.

c) Names

d) Literature.

e) Species-level information

4. Challenges

5. Conclusions.

Page 3: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

INTRODUCTION

The GBIF Secretariat, Denmark

The Global Biodiversity Information Facility

This lecture is presented by Juan C. Bello, Senior Programme Officer for NODES at the GBIF Secretariat in Copenhagen, Denmark.

This is an extensive introduction to biodiversity, biodiversity information and biodiversity informatics. It describes what is considered biodiversity information and its different levels of organization and sources. The challenges identified when dealing with biodiversity data are shown, together with the solutions in place up to the moment.

Some other relevant topics regarding the management of biodiversity information are also tackled, such as data quality or intellectual property rights.

Page 4: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

BIODIVERSITY

Biodiversity and Biodiversity Informatics

Biodiversity levels, a caveat

• Genetic variability: refers to the genetic differences that occur within a particular species that can be passed along to offspring.

• Species diversity: refers to the variety of species that occur within a particular area. Collectively, all of the individuals of a particular species in a particular area form a population.

• Community diversity: refers to the associations of species within an area. These associations, also called biological communities, are the living components of ecosystems.

• Landscape/regional diversity: refers to the variety of ecosystems and communities that can be found within the landscape.

Species diversity: refers to the variety of species that occur within a particular area. Collectively, all of the individuals of a particular species in a particular area form a population.

From: After Noss & Cooperrider (1994), Decker et al. (1991) and Riley & Mohr (1994)

Page 5: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

BIODIVERSITY

Biodiversity and Biodiversity Informatics

The nature of biodiversity information

Primary data• Specimens• Observations • Lit. records

Names• Accepted & synonyms• Type information• Taxonomic schemas

Taxa

•Descriptions, identification keys, conservation, uses, distribution, habitat, etc.

Literature• References• TL2 & BPH• Key-words

Adapted from: Leenhouts, Regnum Veg. 58. 1968.

Page 6: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

BIODIVERSITY

Biodiversity and Biodiversity Informatics

Accessibility of biodiversity information

Biodiversity information was/is not easily accessible:

• It is not in digital form

• It is scattered

This is a problem, not just for scientists, but for society

Page 7: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

BIODIVERSITY

Biodiversity and Biodiversity Informatics

Biodiversity linkages

Developed by Martin Sharman, European Commission

Page 8: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

INFORMATION ACCESIBILITY

Biodiversity and Biodiversity Informatics

Biodiversity Information

Where is the information?

In a myriad of places Access hampered Limited used (outside the scientific community) Substitutes sought

Picture of the past?

Page 9: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

3-1.5 * 109 specimens

3000-6000 institutions

…and this is just GBIF members!

600.000.000 specimens

INFORMATION ACCESIBILITY

Primary data: collections

Page 10: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

INFORMATION ACCESIBILITY

Primary data: collections

people

habitat / ecological

historical / phenological

taxonomic

Spatial / geographic

molecular

Page 11: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

INFORMATION ACCESIBILITY

one database multiple indexing multiple uses

one (card) index n-1 difficult

tasks

Digitalised vs. non digitalised

Page 12: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

INFORMATION ACCESIBILITY

118,809 records

155 data sets from

20 countries (marked in red in the map below)

Scattered sources

Page 13: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

INFORMATION ACCESIBILITY

Access to scattered sources: a workable solution

Precursors:

• REMIB - CONABIO (México)http://www.conabio.gob.mx/remib/doctos/remib_esp.html

• TSA - Univ.Kansashttp://speciesanalyst.net/index.html

Unified access, distributed information: The GBIF Network - data.gbif.org

Imag

e fr

om th

e S

pani

sh G

BIF

Nod

e (w

ww

.gbi

f.es)

Page 14: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

INFORMATION ACCESIBILITY

Access to scattered sources: a workable solution

It is difficult to integrate data: the GBIF Experience

huge variation in ScientificName, Classification, Country

over 150 million occurrence records integrated

4 versions of Darwin Core and 4 of ABCD

4 standard data provider packages

range of alternatives for content item, e.g. date, makes

integration very hard

Page 15: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

Unified access, distributed information: common profile + standards (1/3)

A. Common profile:

Each particular database structure is translated into a “profile”, a table with common field list that can be accessed in a uniform manner

B. Standards:Biodiversity Information Standards

www.tdwg.org

PROFILES & STANDARDS

Page 16: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

Unified access, distributed information: common profile + standards (2/3)

http://www.tdwg.org/standards.html

PROFILES & STANDARDS

• To enable interoperability;

• To avoid the Tower of Babel effect (building isolated, non-communicating data silos);

• Without standards, sharing data between any two databases would require mapping their schemas (many to many);

• A federation schema requires mapping just once (many to one).

Page 17: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

Unified access, distributed information: common profile + standards (3/3)

PROFILES & STANDARDS

Darwin Core (DwC) Access to Biological Collections Data

(ABCD)

Primary occurrence records

Natural history collections Natural Collections Descriptions

(NCD)Taxon level information

Taxon Concept Schema (TCS) Species Profile Model (SPM) Plinian Core

Ecological data

Geospatial data

Ecological Metadata Language (EML)

Geography Markup Language (GML)

Page 18: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

PROFILES & STANDARDS

“a standard designed to facilitate the exchange of information about the geographic occurrence of species and the existence of specimens

in collections”

http://wiki.tdwg.org/DarwinCore/

Darwin Core (DwC)

• Provides a flat set of 46 elements grouped in 7 element sets (record-level, taxonomic, identification, locality, collecting event, biological, reference).

• Provides an extension mechanism to allow further, more specialised element sets to be included (geospatial, curatorial, palaeontological, interaction).

• Best used with simple data that fits into a flat spreadsheet.

Page 19: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

PROFILES & STANDARDS

“ABCD Schema is a common data specification for biological collection units, including living and preserved specimens, along with

field observations that did not produce voucher specimens. It is intended to support the exchange and integration of detailed primary

collection and observation data. ”

ABCD

• Originally developed under EU BioCASE project (The Biological Collections Access Service for Europe);

• More complex and comprehensive than DwC;• Nearly 1200 concepts in hierarchical structure supporting repeating

elements and complex types;• DwC and ABCD strive for compatibility: it is possible to map ABCD

elements to their DwC equivalents;• Best used for detailed records with, e.g., multiple identifications or

linked images.

Page 20: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

NAMES & TAXONOMIC CONCEPTS

Data integration & interoperability: taxonomic concepts

Access to information: one collection /source Access to information: multiple collections

Biodiversity information users must confront an impediment: one species may be addressed under different names; a name may refer to different species or concepts of a species.

Primary data

Names

Concepts

Information, as it is currently presented only make sense to the specialist

Page 21: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

An example: the male fern

NAMES & TAXONOMIC CONCEPTS

Page 22: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

Example: names & concepts

ssp. affinis

D. affinisD. filix-mas

spp. filix-mas

D. filix-mas

D. oreades

ssp. oreades

ssp. stilluppensis

ssp. borrerissp. borreri

Fl. iberica

NAMES & TAXONOMIC CONCEPTS

Page 23: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

Access to information & integration. How big is the problem?

Koperski & al. 2000. Referenzliste der Moose Deutschlands: 45% of treated taxa are unstableCurrent alternative classifications (e.g. Spain):

Fl. iberica Fl. Països CatalansFl. Andalucía OccidentalFl. EuropaeaMed. Check List

NAMES & TAXONOMIC CONCEPTS

Page 24: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

Accessibility of biodiversity information: names, taxonomic views

• Taxonomic views of names provide names arranged according to a specific taxonomic treatment:

Species 2000: http://www.sp2000.org

ITIS: http://www.itis.usda.gov

UBIO: http:/www.ubio.org

• Many others at regional level, e.g.:Index Synonymique de la Flore de France: http://www.dijon.inra.fr/flore-france/

Anthos (flora of Spain): http://www.programanthos.org/

NAMES & TAXONOMIC CONCEPTS

Page 25: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

LITERATURE

LiteratureAs a way to enable access the scientific knowledge.“digital libraries” are becoming a common thing:

One of the firs and biggest:Bibliothèque numérique dela Bibliothèque nationale de France:http://gallica.bnf.fr/

Developments in the area:project to join current initiatives by big institutions: The “Biodiversity Heritage Library” projecthttp://www.bhl.org/

The challenge is open access and, again integration

Page 26: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

SPECIES

Accessibility of biodiversity information

Species level information (Species bank):

information on taxa specifically:not associate to specimens

independent of taxonomic schemas

Survey commissioned by GBIF in 2005 gathered 298 resources. Report & database available at:

http://circa.gbif.net/Public/irc/gbif/pr/library?l=/speciesbank_workshop/database_speciesbanks

Page 27: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

Species level information (Species bank):

examples:INBIO´s UBIs - http://darnis.inbio.ac.cr/ubis/

Fishbase - http://www.fishbase.org efloras - http://www.efloras.org

Encyclopedia of Life - http://www.eol.org

SPECIES

Page 28: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

Plinian Core

www.pliniancore.org www.gbif.es/pliniancore

SPECIES

Page 29: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity and Biodiversity Informatics

CHALLENGES

• To cross the digital divide

From paper to digital formMoving our science into e-taxonomy(Maybe from digital to paper too!)

•To integrate resources

within and between areas (names-specimens-species-literature)

• To cross the science - society divide

make science knowledge accessible to society and used to make sound political decisions

Page 30: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

CONCLUSIONS

Biodiversity and Biodiversity Informatics

• Any collection (resource) is an important piece to

understand biodiversity on Earth ant multiple levels

• The way ahead is to provide that “understanding of

biodiversity” to the society

• Data providers (collection managers, database scientific

administrators, project data managers,…)are in the best

situation to make the best use of data.

Page 31: II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Biodiversity and Biodiversity Informatics Juan C.BELLO Senior Programme

Biodiversity andBiodiversity InformaticsJuan C. BELLOSenior Programme Officer for NODESGBIF SecretariatUniversitetsparken 15DK-2100 Copenhagen, DenmarkTel: +45 3532 1489Fax: +45 3532 1480Email: [email protected]: www.gbif.org

II Course on GBIF Node ManagementArusha, Tanzania

31st October and 1st November 2008

Photo by Scyza,

stock.xchng