Upload
vishwas-chavan
View
1.027
Download
7
Tags:
Embed Size (px)
Citation preview
GLOBALGLOBALBIODIVERSITYBIODIVERSITYINFORMATIONINFORMATIONFACILITYFACILITY
WWW.GBIF.OWWW.GBIF.ORGRG
Publishing EIA Biodiversity Data:
Technology and Infrastructure
Vishwas Chavan, Nick King and Francois RogersVishwas Chavan, Nick King and Francois RogersGlobal Biodiversity Information FacilityGlobal Biodiversity Information Facility
[email protected]@gbif.org
Scoping Workshop on Developing an EIA Biodiversity Data Publishing Framework in South Africa2-4 March 2010, Cape Town, South Africa
ContentsContents
EIA Biodiversity Data: Types and formats
Data Capture & Digitisation tools Data Discovery Data Publishing Data Quality & fitness-for-use Data Hosting Centers Community Building Platforms
What are the challenges?What are the challenges?
More data types
More data types
Richer user interface
Richer user interface
Better managementBetter management
Richer contentRicher
content
Better synchronisation
Better synchronisation
Improved discoveryImproved discovery
EIA BIODIVERSITY DATA: TYPES AND FORMATS
EIA BIODIVERSITY DATA: TYPES AND FORMATS
Evidence
Metadata
Taxon names
Taxon concepts
IndicesNomenclatorsNamebanks
BiologyConservationEcologyDistributionPhylogenies ...
GeolocationCountryCollectorDate …
Observation
Voucher specimenBlood sampleDNA BarcodeImageAudioVideo ...
Literature
Species banksBHLPlazi.org...
EIA Biodiversity data are very diverse
DATA CAPTURE AND DIGITISATION TOOLSDATA CAPTURE AND DIGITISATION TOOLS
Florin Pandora
TaxisCassia FieldNote
MandalaATTA
BirdRecorder
Data Capture and Digitisation ToolsData Capture and Digitisation Tools
uBio ToolsuBio Tools
Name recognition tool (FindIT) Author abbreviation resolver Checking classification (TSN name mapper) Deconsrtuct scientific name (ParseIT) Find scientific name (CrawlIT) etc…
http://www.ubio.org
GBIF TemplatesGBIF Templates
Capture data in DwC compatible format Occurrence Data Template Names Data Template
Facilitate authoring ’resource metadata’ Occurrence template Documentation for occurrence templ
ate
GBIF Informatics ArchitectureGBIF Informatics Architecture
Improved accessto Names, Metadata and Primary Biodiversity Data
Distributed GBIF informatics architecture
Faster and easier publishing of data
DATA DISCOVERYDATA DISCOVERY
• GBRDS REGISTRY
• METADATA CATALOGUE
• GBRDS REGISTRY
• METADATA CATALOGUE
GBRDS: Global Biodiversity Resources Discovery System
DATA DISCOVERY:GBRDS REGISTRY
DATA DISCOVERY:GBRDS REGISTRY
GBRDS, a Discovery SystemGBRDS, a Discovery System
ConsumersDataPublishers
Discovering
SearchingRetrieving
DiscoverySystem
Registering
ServicePublishers Others…
That links to resources…That links to resources…
Who? Institutions, Collections …
What?
Where?
When?
How
Data, Services, GUID/LSID…
Location, Access points…
Temporal Scope…
Formats, protocols, qualities
A distributed service ………….. which resolves to information resources
…./
Global Biodiversity Resources Discovery System
Global Biodiversity Resources Discovery System
Institutions/Collections LSIDs/DOI/GUIDs Standards Protocols Resources Services/Applications etc…
Global Biodiversity Resources Discovery System
Global Biodiversity Resources Discovery System
Institutions/Collections LSIDs/DOI/GUIDs Standards Protocols Resources Services/Applications etc…
GBRDS Registry Release: April 2010
DATA DISCOVERY: METADATA CATALOGUES
DATA DISCOVERY: METADATA CATALOGUES
User Perspective
Data Producer Perspective• Document data with minimum effort• Assess the value of the data for others• Bridge the gap between data owners and users• Educate users about the characteristics of the data
Craglia: http://www.ec-gis.org/Workshops/6ec-gis/papers/craglia-metadata.doc
Two perspectives on metadata
• Discover if data exists• Identify source, provenance• Make judgement about data quality and usability before getting it• Minimise costs involved in the search, retrieval, integration and use of the data
Two levels of metadata
Discovery Metadata
Full Metadata
Discover if a resource exists; get information on -• Ownership• Location• How to get further information
Provides a full description of the resource, including -
• Data quality• Data lineage• Full access and exploitation
Natural Collections Descriptions (NCD)
Ecological Metadata Language (EML)
ISO 19115/19139
FGDC Biological Data Profile
Metadata Standards
Dublin Core
MRTG Multimedia Metadata Schema
IPT 1.1 Metadata Profile
DATA PUBLISHINGDATA PUBLISHING
Key Components: the IPTKey Components: the IPT
IPTIPTIPTIPT
The Integrated Publishing Toolkit isa state-of-the-art tool to simplify the mobilisation of biodiversity information resources such as Names, Metadata andprimary biodiversity data
The Integrated Publishing Toolkit isa state-of-the-art tool to simplify the mobilisation of biodiversity information resources such as Names, Metadata andprimary biodiversity data
Data Publisher
Registration (GBRDS) +Publishing of Names, Metadata,Primary biodiversity data etc…
Simple process!Simple process!
The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Names, Metadata and Primary Biodiversity Data!
The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Names, Metadata and Primary Biodiversity Data!
GBIF Integrated Publishing Toolkit (IPT)GBIF Integrated Publishing Toolkit (IPT)
Open source Java web application Bypasses limitations of traditional wrapper tools in
publishing large amounts of data by publishing whole datasets in DwC-Archive dumps (especially useful for small data publishers or those with little or no internet access)
Has a richer environment than current wrapper tools, providing some data cleaning, visualisation capabilities, and the ability to publish dataset metadata
Documentation and download http://code.google.com/p/gbif-providertoolkit/
Demo site http://ipt.gbif.org
* Darwin Core (Text-Archive) based on standard submitted to TDWG for review Feb 2009
IPT Publishes Through…IPT Publishes Through…
More to come….
IPT DemoIPT Demo Screencast of IPT demo GBIF Help Desk ([email protected])
IPT 1.1 Release: April 2010
NAMES DATANAMES DATA
Scope of the Global Names Architecture
Scope of the Global Names Architecture
Referencing names in Checklists to a common Nomenclatural Index
Checklist Bank – A Name Services brokerage
Checklist Bank – A Name Services brokerage
Global broker of taxonomic data
Index of Taxonomic Catalogues and Annotated Checklists
Extends the GBIF network to support publishing Species-level data
Publishing Checklists to GBIFPublishing Checklists to GBIF
Using Integrated Publishing Toolkit Via pre-composed Spreadsheet templates Exporting according to DwC Archive
format and registering a local data file (self-serve)
GBIF desktop publishing tool Other taxonomic editors (EDIT/ITIS) that
support DwC Archive format
Desktop Annotated Checklist BuilderDesktop Annotated Checklist Builder
Create, manage, publish
Synonymised checklistsVernacular NamesDistribution dataBibliographyType/Specimen data
Mac OS/ Windows
Publishes “GBIF-ready” format
DwC Archive – simple, extensible Text-based format
Q3 2010
Controlled Vocabularies ServerControlled Vocabularies Server
ISO: CountriesISO: LanguageDwC: Basis of RecordDwC: Nomenclatural StatusDwC: Sex (Gender)DwC: Taxonomic StatusIUCN: Threat Status…
vocabularies.gbif.org
Vocabularies publishing platform – Internationalise all GBIF vocabularies
Controlled Vocabularies ServerControlled Vocabularies Server
Create, manage, publish
Extensions to Darwin Core
Extend Occurrence Data
Extend Species Data
vocabularies.gbif.org
Tie to vocabularies that are also drafted and published to this system. Then translate to your native langauge..
DATA QUALITY & FITNESS-FOR-USEDATA QUALITY & FITNESS-FOR-USE
Fitness-for-useFitness-for-use
• Primary biodiversity data can be used for multiple purposes by various user communities worldwide.
• Assessing and enhancing fitness-for-use of data is therefore critical for the scientific and social relevance of biodiversity science.
Fitness-for-use varies from one use case to another.....
Data quality assessment and quality control are important components of ‘fitness-for-use’ regime
Loss of Data QualityLoss of Data Quality
At the time of collection
During digitisation
During documentation
During storage and archiving
During analysis and manipulation
During dissemination and presentation
Through the use to which they are put
Issues influencing data qualityIssues influencing data quality
• Accuracy and precision• Completeness• Currency and Timeliness• Update frequency• Consistency• Flexibility• Transparency• Performance measures and targets• Data cleaning• Outliers• setting targets for improvement• Truth in labelling
• Error and bias• Uncertainty• Auditability• Edit Controls• Minimise duplication and reworking of data• Maintenance of original (or verbatim) data• Categorisation can lead to loss of data and quality• Documentation• Feedback• Education and Training • Accountability
Data quality: Responsible PlayersData quality: Responsible Players
Collectors
Custodian or Curator
Aggregator
Publisher
Users
Data Cleaning: definition & framework
Data Cleaning: definition & framework
A process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of detected errors and omissions
General framework for data cleaning
Define and determine error types Search and identify error instances Correct the errors Document error instances and error types;
and Modify data entry procedures to reduce future
errors
Tools and Best PracticesTools and Best Practices
http://mapstedi.colorado.edu/http://manisnet.org/GeorefGuide.html
Tools and Best PracticesTools and Best Practices
GBIF Templates
Best Practice GuidelinesBest Practice Guidelines
All freely availableAll freely available
Best resource…Best resource…
Chapters on Data Quality Data Cleaning Geo-referencing Generalising
sensitive data
http://www2.gbif.org/TM1.pdf
DATA HOSTING CENTERSDATA HOSTING CENTERS
Data Hosting CentersData Hosting Centers
Caters to data publishers without skills & resources
Facilitate long term archival and publishing
GBIF Plans Criteria for establishing DHC Criteria for endorsement of DHC Tools and Best Practices for DHC
Data Hosting CentersData Hosting Centers
COMMUNITY BUILDING PLATFORMS
COMMUNITY BUILDING PLATFORMS
http://community.gbif.org
??Email: [email protected]
Skype: vishwaschavan