GLOBAL BIODIVERSITY

GLOBALGLOBALBIODIVERSITYBIODIVERSITY

INFORMATIONINFORMATIONFACILITYFACILITY

Designing a Global Network Designing a Global Network to Accommodate to Accommodate Contributions from all Contributions from all Sources and Technical Sources and Technical AbilitiesAbilities

Tim RobertsonTim RobertsonGBIF SecretariatGBIF Secretariat

How the GBIF index is built

Joining the GBIF network• Technical requirements• Documentation on services and standards

The use of current protocols for data harvesting

Simplified full dataset harvesting

The new GBIF integrated publishing toolkit

Extending the model – Simple Transfer Schema task group

Content

Today: How the network is structured

Today: Entry requirements

Basis of Record: Data served

(Source: GBIF Data Portal October 2008)

Basis of Record: What the standards say

International Standards Organisation 2 digit country codes (ISO 3166)

• Multilingual (English, French + external translations)• Simple Tab Demitted File format• Loads straight into database for reuse• As simple as it needs to be…

Comparison: International Standards Organisation

For controlled vocabularies, could this approach be adopted?

Could removing complex technical schemas allow for easier contribution?

Provider has TAPIR wrapper• Wrapper allows for 200 records per request• 260,000 records to harvest• 1300 request / responses

9 hours total500MB XML transferredExtracted to a 32MB delimited file for the indexCompressed to 3MB

Why not produce this on the provider?

Harvesting: Using existing protocols

Benefits• Indexes can be more up-to-date

• better for the user• benefits provider

• Provider systems can be left to answer specific real queries• the original purpose for the wrapper software

• Easy for small data publishers to produce

Already done in an ad-hoc manner for very large providers

Not dissimilar to Sitemaps protocol

Harvesting: Streamlining the process

If this is already being done in an ad-hoc manner, should it be defined as a standard?

Harvesting: Streamlining the process

Publishing of • Occurrence data• Checklist data • Taxonomic data• Dataset descriptive data (metadata)

Key features• Embedded data cache

• takes load off ”LIVE” system• allows for file based importing

• Web application to search and browse data• TAPIR, WFS, WMS, TCS, EML, RSS, ”Local DwC Index” • Simple extensions – the ”star schema”• Can be used in a hosting environment

GBIF: The integrated publishing toolkit (IPT)


Ready for ”alpha” testing – please enquire!

Demonstrations by Markus Döring and Tim Robertson all week

Poster

Lunchtime session Tuesday


The data being mobilised is largely “single core entity”• the “Occurrence Record”

Integrating with other areas? • Earth observation networks • Ecological networks

Task group to investigate specific use cases to determine a Common Transfer Schema:

• Primarily data modeling experience• Technical implementation• Presentation to TDWG community

Perhaps multiple core entities, each extensible?

Extending the model: More data types



Tim Robertson

GBIF SecretariatUniversitetsparken 152100 CopenhagenDenmark

[email protected]

Contact

Documents

GLOBAL BIODIVERSITY