17
GLOBAL GLOBAL BIODIVERSITY BIODIVERSITY INFORMATION INFORMATION FACILITY FACILITY Designing a Global Designing a Global Network to Accommodate Network to Accommodate Contributions from all Contributions from all Sources and Technical Sources and Technical Abilities Abilities Tim Robertson Tim Robertson GBIF Secretariat GBIF Secretariat

GLOBAL BIODIVERSITY

Embed Size (px)

DESCRIPTION

GLOBAL BIODIVERSITY. INFORMATION FACILITY. Designing a Global Network to Accommodate Contributions from all Sources and Technical Abilities. Tim Robertson GBIF Secretariat. Content. How the GBIF index is built Joining the GBIF network Technical requirements - PowerPoint PPT Presentation

Citation preview

Page 1: GLOBAL BIODIVERSITY

GLOBALGLOBALBIODIVERSITYBIODIVERSITY

INFORMATIONINFORMATIONFACILITYFACILITY

Designing a Global Network Designing a Global Network to Accommodate to Accommodate Contributions from all Contributions from all Sources and Technical Sources and Technical AbilitiesAbilities

Tim RobertsonTim RobertsonGBIF SecretariatGBIF Secretariat

Page 2: GLOBAL BIODIVERSITY

How the GBIF index is built

Joining the GBIF network• Technical requirements• Documentation on services and standards

The use of current protocols for data harvesting

Simplified full dataset harvesting

The new GBIF integrated publishing toolkit

Extending the model – Simple Transfer Schema task group

Content

Page 3: GLOBAL BIODIVERSITY

Today: How the network is structured

Page 4: GLOBAL BIODIVERSITY

Today: Entry requirements

Page 5: GLOBAL BIODIVERSITY

Basis of Record: Data served

(Source: GBIF Data Portal October 2008)

Page 6: GLOBAL BIODIVERSITY

Basis of Record: What the standards say

Page 7: GLOBAL BIODIVERSITY

International Standards Organisation 2 digit country codes (ISO 3166)

• Multilingual (English, French + external translations)• Simple Tab Demitted File format• Loads straight into database for reuse• As simple as it needs to be…

Comparison: International Standards Organisation

For controlled vocabularies, could this approach be adopted?

Could removing complex technical schemas allow for easier contribution?

Page 8: GLOBAL BIODIVERSITY

Provider has TAPIR wrapper• Wrapper allows for 200 records per request• 260,000 records to harvest• 1300 request / responses

9 hours total500MB XML transferredExtracted to a 32MB delimited file for the indexCompressed to 3MB

Why not produce this on the provider?

Harvesting: Using existing protocols

Page 9: GLOBAL BIODIVERSITY

Benefits• Indexes can be more up-to-date

• better for the user• benefits provider

• Provider systems can be left to answer specific real queries• the original purpose for the wrapper software

• Easy for small data publishers to produce

Already done in an ad-hoc manner for very large providers

Not dissimilar to Sitemaps protocol

Harvesting: Streamlining the process

Page 10: GLOBAL BIODIVERSITY

If this is already being done in an ad-hoc manner, should it be defined as a standard?

Harvesting: Streamlining the process

Page 11: GLOBAL BIODIVERSITY

Publishing of • Occurrence data• Checklist data • Taxonomic data• Dataset descriptive data (metadata)

Key features• Embedded data cache

• takes load off ”LIVE” system• allows for file based importing

• Web application to search and browse data• TAPIR, WFS, WMS, TCS, EML, RSS, ”Local DwC Index” • Simple extensions – the ”star schema”• Can be used in a hosting environment

GBIF: The integrated publishing toolkit (IPT)

Page 12: GLOBAL BIODIVERSITY

GBIF: The integrated publishing toolkit (IPT)

Page 13: GLOBAL BIODIVERSITY

Ready for ”alpha” testing – please enquire!

Demonstrations by Markus Döring and Tim Robertson all week

Poster

Lunchtime session Tuesday

GBIF: The integrated publishing toolkit (IPT)

Page 14: GLOBAL BIODIVERSITY

The data being mobilised is largely “single core entity”• the “Occurrence Record”

Integrating with other areas? • Earth observation networks • Ecological networks

Task group to investigate specific use cases to determine a Common Transfer Schema:

• Primarily data modeling experience• Technical implementation• Presentation to TDWG community

Perhaps multiple core entities, each extensible?

Extending the model: More data types

Page 15: GLOBAL BIODIVERSITY

Extending the model: More data types

Page 16: GLOBAL BIODIVERSITY

Extending the model: More data types

Page 17: GLOBAL BIODIVERSITY

Tim Robertson

GBIF SecretariatUniversitetsparken 152100 CopenhagenDenmark

[email protected]

Contact