49
GLOBAL GLOBAL BIODIVERSITY BIODIVERSITY INFORMATION INFORMATION FACILITY FACILITY WWW.GBIF.ORG WWW.GBIF.ORG Publishing EIA Biodiversity Data: Technology and Infrastructure Vishwas Chavan, Nick King and Francois Rogers Vishwas Chavan, Nick King and Francois Rogers Global Biodiversity Information Facility Global Biodiversity Information Facility [email protected] [email protected] Scoping Workshop on Developing an EIA Biodiversity Data Publishing Framework in South Africa 2-4 March 2010, Cape Town, South Africa

EIA Biodiversity Data Mobilisation

Embed Size (px)

Citation preview

Page 1: EIA Biodiversity Data Mobilisation

GLOBALGLOBALBIODIVERSITYBIODIVERSITYINFORMATIONINFORMATIONFACILITYFACILITY

WWW.GBIF.OWWW.GBIF.ORGRG

Publishing EIA Biodiversity Data:

Technology and Infrastructure

Vishwas Chavan, Nick King and Francois RogersVishwas Chavan, Nick King and Francois RogersGlobal Biodiversity Information FacilityGlobal Biodiversity Information Facility

[email protected]@gbif.org

Scoping Workshop on Developing an EIA Biodiversity Data Publishing Framework in South Africa2-4 March 2010, Cape Town, South Africa

Page 2: EIA Biodiversity Data Mobilisation

ContentsContents

EIA Biodiversity Data: Types and formats

Data Capture & Digitisation tools Data Discovery Data Publishing Data Quality & fitness-for-use Data Hosting Centers Community Building Platforms

Page 3: EIA Biodiversity Data Mobilisation

What are the challenges?What are the challenges?

More data types

More data types

Richer user interface

Richer user interface

Better managementBetter management

Richer contentRicher

content

Better synchronisation

Better synchronisation

Improved discoveryImproved discovery

Page 4: EIA Biodiversity Data Mobilisation

EIA BIODIVERSITY DATA: TYPES AND FORMATS

EIA BIODIVERSITY DATA: TYPES AND FORMATS

Page 5: EIA Biodiversity Data Mobilisation

Evidence

Metadata

Taxon names

Taxon concepts

IndicesNomenclatorsNamebanks

BiologyConservationEcologyDistributionPhylogenies ...

GeolocationCountryCollectorDate …

Observation

Voucher specimenBlood sampleDNA BarcodeImageAudioVideo ...

Literature

Species banksBHLPlazi.org...

EIA Biodiversity data are very diverse

Page 6: EIA Biodiversity Data Mobilisation

DATA CAPTURE AND DIGITISATION TOOLSDATA CAPTURE AND DIGITISATION TOOLS

Page 7: EIA Biodiversity Data Mobilisation

Florin Pandora

TaxisCassia FieldNote

MandalaATTA

BirdRecorder

Data Capture and Digitisation ToolsData Capture and Digitisation Tools

Page 8: EIA Biodiversity Data Mobilisation

uBio ToolsuBio Tools

Name recognition tool (FindIT) Author abbreviation resolver Checking classification (TSN name mapper) Deconsrtuct scientific name (ParseIT) Find scientific name (CrawlIT) etc…

http://www.ubio.org

Page 9: EIA Biodiversity Data Mobilisation

GBIF TemplatesGBIF Templates

Capture data in DwC compatible format Occurrence Data Template Names Data Template

Facilitate authoring ’resource metadata’ Occurrence template Documentation for occurrence templ

ate

Page 10: EIA Biodiversity Data Mobilisation

GBIF Informatics ArchitectureGBIF Informatics Architecture

Improved accessto Names, Metadata and Primary Biodiversity Data

Distributed GBIF informatics architecture

Faster and easier publishing of data

Page 11: EIA Biodiversity Data Mobilisation

DATA DISCOVERYDATA DISCOVERY

• GBRDS REGISTRY

• METADATA CATALOGUE

• GBRDS REGISTRY

• METADATA CATALOGUE

GBRDS: Global Biodiversity Resources Discovery System

Page 12: EIA Biodiversity Data Mobilisation

DATA DISCOVERY:GBRDS REGISTRY

DATA DISCOVERY:GBRDS REGISTRY

Page 13: EIA Biodiversity Data Mobilisation

GBRDS, a Discovery SystemGBRDS, a Discovery System

ConsumersDataPublishers

Discovering

SearchingRetrieving

DiscoverySystem

Registering

ServicePublishers Others…

Page 14: EIA Biodiversity Data Mobilisation

That links to resources…That links to resources…

Who? Institutions, Collections …

What?

Where?

When?

How

Data, Services, GUID/LSID…

Location, Access points…

Temporal Scope…

Formats, protocols, qualities

A distributed service ………….. which resolves to information resources

…./

Page 15: EIA Biodiversity Data Mobilisation

Global Biodiversity Resources Discovery System

Global Biodiversity Resources Discovery System

Institutions/Collections LSIDs/DOI/GUIDs Standards Protocols Resources Services/Applications etc…

Page 16: EIA Biodiversity Data Mobilisation

Global Biodiversity Resources Discovery System

Global Biodiversity Resources Discovery System

Institutions/Collections LSIDs/DOI/GUIDs Standards Protocols Resources Services/Applications etc…

GBRDS Registry Release: April 2010

Page 17: EIA Biodiversity Data Mobilisation

DATA DISCOVERY: METADATA CATALOGUES

DATA DISCOVERY: METADATA CATALOGUES

Page 18: EIA Biodiversity Data Mobilisation

User Perspective

Data Producer Perspective• Document data with minimum effort• Assess the value of the data for others• Bridge the gap between data owners and users• Educate users about the characteristics of the data

Craglia: http://www.ec-gis.org/Workshops/6ec-gis/papers/craglia-metadata.doc

Two perspectives on metadata

• Discover if data exists• Identify source, provenance• Make judgement about data quality and usability before getting it• Minimise costs involved in the search, retrieval, integration and use of the data

Page 19: EIA Biodiversity Data Mobilisation

Two levels of metadata

Discovery Metadata

Full Metadata

Discover if a resource exists; get information on -• Ownership• Location• How to get further information

Provides a full description of the resource, including -

• Data quality• Data lineage• Full access and exploitation

Page 20: EIA Biodiversity Data Mobilisation

Natural Collections Descriptions (NCD)

Ecological Metadata Language (EML)

ISO 19115/19139

FGDC Biological Data Profile

Metadata Standards

Dublin Core

MRTG Multimedia Metadata Schema

IPT 1.1 Metadata Profile

Page 21: EIA Biodiversity Data Mobilisation

DATA PUBLISHINGDATA PUBLISHING

Page 22: EIA Biodiversity Data Mobilisation

Key Components: the IPTKey Components: the IPT

IPTIPTIPTIPT

The Integrated Publishing Toolkit isa state-of-the-art tool to simplify the mobilisation of biodiversity information resources such as Names, Metadata andprimary biodiversity data

The Integrated Publishing Toolkit isa state-of-the-art tool to simplify the mobilisation of biodiversity information resources such as Names, Metadata andprimary biodiversity data

Data Publisher

Registration (GBRDS) +Publishing of Names, Metadata,Primary biodiversity data etc…

Page 23: EIA Biodiversity Data Mobilisation

Simple process!Simple process!

The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Names, Metadata and Primary Biodiversity Data!

The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Names, Metadata and Primary Biodiversity Data!

Page 24: EIA Biodiversity Data Mobilisation

GBIF Integrated Publishing Toolkit (IPT)GBIF Integrated Publishing Toolkit (IPT)

Open source Java web application Bypasses limitations of traditional wrapper tools in

publishing large amounts of data by publishing whole datasets in DwC-Archive dumps (especially useful for small data publishers or those with little or no internet access)

Has a richer environment than current wrapper tools, providing some data cleaning, visualisation capabilities, and the ability to publish dataset metadata

Documentation and download http://code.google.com/p/gbif-providertoolkit/

Demo site http://ipt.gbif.org

Page 25: EIA Biodiversity Data Mobilisation

* Darwin Core (Text-Archive) based on standard submitted to TDWG for review Feb 2009

IPT Publishes Through…IPT Publishes Through…

More to come….

Page 26: EIA Biodiversity Data Mobilisation

IPT DemoIPT Demo Screencast of IPT demo GBIF Help Desk ([email protected])

IPT 1.1 Release: April 2010

Page 27: EIA Biodiversity Data Mobilisation

NAMES DATANAMES DATA

Page 28: EIA Biodiversity Data Mobilisation

Scope of the Global Names Architecture

Scope of the Global Names Architecture

Referencing names in Checklists to a common Nomenclatural Index

Page 29: EIA Biodiversity Data Mobilisation

Checklist Bank – A Name Services brokerage

Checklist Bank – A Name Services brokerage

Global broker of taxonomic data

Index of Taxonomic Catalogues and Annotated Checklists

Extends the GBIF network to support publishing Species-level data

Page 30: EIA Biodiversity Data Mobilisation

Publishing Checklists to GBIFPublishing Checklists to GBIF

Using Integrated Publishing Toolkit Via pre-composed Spreadsheet templates Exporting according to DwC Archive

format and registering a local data file (self-serve)

GBIF desktop publishing tool Other taxonomic editors (EDIT/ITIS) that

support DwC Archive format

Page 31: EIA Biodiversity Data Mobilisation

Desktop Annotated Checklist BuilderDesktop Annotated Checklist Builder

Create, manage, publish

Synonymised checklistsVernacular NamesDistribution dataBibliographyType/Specimen data

Mac OS/ Windows

Publishes “GBIF-ready” format

DwC Archive – simple, extensible Text-based format

Q3 2010

Page 32: EIA Biodiversity Data Mobilisation

Controlled Vocabularies ServerControlled Vocabularies Server

ISO: CountriesISO: LanguageDwC: Basis of RecordDwC: Nomenclatural StatusDwC: Sex (Gender)DwC: Taxonomic StatusIUCN: Threat Status…

vocabularies.gbif.org

Vocabularies publishing platform – Internationalise all GBIF vocabularies

Page 33: EIA Biodiversity Data Mobilisation

Controlled Vocabularies ServerControlled Vocabularies Server

Create, manage, publish

Extensions to Darwin Core

Extend Occurrence Data

Extend Species Data

vocabularies.gbif.org

Tie to vocabularies that are also drafted and published to this system. Then translate to your native langauge..

Page 34: EIA Biodiversity Data Mobilisation

DATA QUALITY & FITNESS-FOR-USEDATA QUALITY & FITNESS-FOR-USE

Page 35: EIA Biodiversity Data Mobilisation

Fitness-for-useFitness-for-use

• Primary biodiversity data can be used for multiple purposes by various user communities worldwide.

• Assessing and enhancing fitness-for-use of data is therefore critical for the scientific and social relevance of biodiversity science.

Fitness-for-use varies from one use case to another.....

Data quality assessment and quality control are important components of ‘fitness-for-use’ regime

Page 36: EIA Biodiversity Data Mobilisation

Loss of Data QualityLoss of Data Quality

At the time of collection

During digitisation

During documentation

During storage and archiving

During analysis and manipulation

During dissemination and presentation

Through the use to which they are put

Page 37: EIA Biodiversity Data Mobilisation

Issues influencing data qualityIssues influencing data quality

• Accuracy and precision• Completeness• Currency and Timeliness• Update frequency• Consistency• Flexibility• Transparency• Performance measures and targets• Data cleaning• Outliers• setting targets for improvement• Truth in labelling

• Error and bias• Uncertainty• Auditability• Edit Controls• Minimise duplication and reworking of data• Maintenance of original (or verbatim) data• Categorisation can lead to loss of data and quality• Documentation• Feedback• Education and Training • Accountability

Page 38: EIA Biodiversity Data Mobilisation

Data quality: Responsible PlayersData quality: Responsible Players

Collectors

Custodian or Curator

Aggregator

Publisher

Users

Page 39: EIA Biodiversity Data Mobilisation

Data Cleaning: definition & framework

Data Cleaning: definition & framework

A process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of detected errors and omissions

General framework for data cleaning

Define and determine error types Search and identify error instances Correct the errors Document error instances and error types;

and Modify data entry procedures to reduce future

errors

Page 40: EIA Biodiversity Data Mobilisation

Tools and Best PracticesTools and Best Practices

http://mapstedi.colorado.edu/http://manisnet.org/GeorefGuide.html

Page 41: EIA Biodiversity Data Mobilisation

Tools and Best PracticesTools and Best Practices

GBIF Templates

Page 42: EIA Biodiversity Data Mobilisation

Best Practice GuidelinesBest Practice Guidelines

All freely availableAll freely available

Page 43: EIA Biodiversity Data Mobilisation

Best resource…Best resource…

Chapters on Data Quality Data Cleaning Geo-referencing Generalising

sensitive data

http://www2.gbif.org/TM1.pdf

Page 44: EIA Biodiversity Data Mobilisation

DATA HOSTING CENTERSDATA HOSTING CENTERS

Page 45: EIA Biodiversity Data Mobilisation

Data Hosting CentersData Hosting Centers

Caters to data publishers without skills & resources

Facilitate long term archival and publishing

GBIF Plans Criteria for establishing DHC Criteria for endorsement of DHC Tools and Best Practices for DHC

Page 46: EIA Biodiversity Data Mobilisation

Data Hosting CentersData Hosting Centers

Page 47: EIA Biodiversity Data Mobilisation

COMMUNITY BUILDING PLATFORMS

COMMUNITY BUILDING PLATFORMS

Page 48: EIA Biodiversity Data Mobilisation

http://community.gbif.org

Page 49: EIA Biodiversity Data Mobilisation

??Email: [email protected]

Skype: vishwaschavan