Upload
moses-zamora
View
25
Download
2
Embed Size (px)
DESCRIPTION
GLOBAL BIODIVERSITY. INFORMATION FACILITY. Designing a Global Network to Accommodate Contributions from all Sources and Technical Abilities. Tim Robertson GBIF Secretariat. Content. How the GBIF index is built Joining the GBIF network Technical requirements - PowerPoint PPT Presentation
Citation preview
GLOBALGLOBALBIODIVERSITYBIODIVERSITY
INFORMATIONINFORMATIONFACILITYFACILITY
Designing a Global Network Designing a Global Network to Accommodate to Accommodate Contributions from all Contributions from all Sources and Technical Sources and Technical AbilitiesAbilities
Tim RobertsonTim RobertsonGBIF SecretariatGBIF Secretariat
How the GBIF index is built
Joining the GBIF network• Technical requirements• Documentation on services and standards
The use of current protocols for data harvesting
Simplified full dataset harvesting
The new GBIF integrated publishing toolkit
Extending the model – Simple Transfer Schema task group
Content
Today: How the network is structured
Today: Entry requirements
Basis of Record: Data served
(Source: GBIF Data Portal October 2008)
Basis of Record: What the standards say
International Standards Organisation 2 digit country codes (ISO 3166)
• Multilingual (English, French + external translations)• Simple Tab Demitted File format• Loads straight into database for reuse• As simple as it needs to be…
Comparison: International Standards Organisation
For controlled vocabularies, could this approach be adopted?
Could removing complex technical schemas allow for easier contribution?
Provider has TAPIR wrapper• Wrapper allows for 200 records per request• 260,000 records to harvest• 1300 request / responses
9 hours total500MB XML transferredExtracted to a 32MB delimited file for the indexCompressed to 3MB
Why not produce this on the provider?
Harvesting: Using existing protocols
Benefits• Indexes can be more up-to-date
• better for the user• benefits provider
• Provider systems can be left to answer specific real queries• the original purpose for the wrapper software
• Easy for small data publishers to produce
Already done in an ad-hoc manner for very large providers
Not dissimilar to Sitemaps protocol
Harvesting: Streamlining the process
If this is already being done in an ad-hoc manner, should it be defined as a standard?
Harvesting: Streamlining the process
Publishing of • Occurrence data• Checklist data • Taxonomic data• Dataset descriptive data (metadata)
Key features• Embedded data cache
• takes load off ”LIVE” system• allows for file based importing
• Web application to search and browse data• TAPIR, WFS, WMS, TCS, EML, RSS, ”Local DwC Index” • Simple extensions – the ”star schema”• Can be used in a hosting environment
GBIF: The integrated publishing toolkit (IPT)
GBIF: The integrated publishing toolkit (IPT)
Ready for ”alpha” testing – please enquire!
Demonstrations by Markus Döring and Tim Robertson all week
Poster
Lunchtime session Tuesday
GBIF: The integrated publishing toolkit (IPT)
The data being mobilised is largely “single core entity”• the “Occurrence Record”
Integrating with other areas? • Earth observation networks • Ecological networks
Task group to investigate specific use cases to determine a Common Transfer Schema:
• Primarily data modeling experience• Technical implementation• Presentation to TDWG community
Perhaps multiple core entities, each extensible?
Extending the model: More data types
Extending the model: More data types
Extending the model: More data types