View
219
Download
1
Category
Preview:
Citation preview
An introduction to data exchange An introduction to data exchange protocols in TDWGprotocols in TDWG
Renato De GiovanniTDWG 2008
Overview of the presentationOverview of the presentation
History and context: When and how protocols started to be discussed in TDWG
The basic idea behind distributed queries
Main features of TAPIR
Other protocols
Current status of TAPIR
TDWG StandardsTDWG Standards
Historically, TDWG has concentrated efforts in:
creating controlled vocabularies, indexes, guidelines and best practices (1985 until ~2000):
– Index Herbariorum, Authors of plant names, Floristic regions of the world, etc.
creating standards to represent different types of biodiversity data (2000 until today):
– SDD (descriptions), ABCD (specimens), TCS (names and concepts). More being created.
First networks in our communityFirst networks in our community
REMIBREMIB
ENHSINENHSINENHSINENHSIN
Z39.50
custom
HISPID
custom
protocol:
More networks followed...
Australia’s Virtual Herbarium (1999)
• Included a data abstraction layer
(HISPID) and a simple protocol to
return records.
• HISPID became a TDWG standard.
• This approach was only used by the
AVH.
The Species Analyst (1999)
• Z39.50 was created and maintained by
the Library of Congress.
• Pre-Web technology (no HTTP).
• Protocol is bound to data abstraction
layer.
• Limited support to XML and Unicode.
Z39.50
DiGIR
BioCASe
TAPIR
MaNIS, speciesLink, OBIS... (2002)
• DiGIR was funded by a NSF project.
• Motivation was to replace Z39.50 with
a new a protocol without the Z39.50
limitations and then split TSA into
multiple thematic networks.
BioCASE Network (2003)
• Created after many unsuccessful
attempts to reach an agreement with
the DiGIR community.
• Can be used with more complex data
abstraction layers like ABCD.
TAPIR protocol (2004)
• Initial study contracted by GBIF to
eliminate interoperability problems
and duplication of efforts.
• TDWG was the venue for discussions
(currently an official task group).
Protocols, Networks & TDWGProtocols, Networks & TDWG
HISPID
Windows
GNU/LinuxMac OS X
MS Access
PostgreSQLMySQL
http://...
http://...http://...
protocol + data abstraction layer
e.g. DarwinCore
Client
Main scenario: Distributed queriesMain scenario: Distributed queries
provider 1provider 2 provider 3
otherproviders
Reasons for the existence of TAPIRReasons for the existence of TAPIR
• TAPIR can potentially be used to exchange data encoded in most (if not all) XML standards defined by the other TDWG groups. TAPIR is one of the main components of the new TDWG Architecture.
• When integrating DiGIR and BioCASe, the other existing protocol alternatives were not considered suitable.
• Changing the existing DiGIR and BioCASe networks to use a completely different protocol would cause major impacts in existing tools. TAPIR keeps many similarities with DiGIR and BioCASe to avoid such impacts.
Main features of TAPIRMain features of TAPIR
• Uses the Web (HTTP) to communicate with providers.
• Responses are always structured in XML.
• Can be used with different data abstraction layers.
• Can return different types of search responses.
• Tries to address the basic needs of federated networks through 5 operations.
Metadata operationMetadata operation(default)(default)
1- Need to identify providers and get basic information 1- Need to identify providers and get basic information about the serviceabout the service
• Who is responsible for the service?
• How can the owner be contacted?
• What kind of data is being served?
• In which language is the data?
• Are there any IPR restrictions?
Capabilities operationCapabilities operation
2- Need to get technical information about the service2- Need to get technical information about the service
• What data abstraction layer is being used?
• What operations are available?
• Does the provider only understand specific query templates? (which ones?)
• Does the provider support custom (on-the-fly) filters?
• Does the provider support custom (on-the-fly) response types?
Inventory operationInventory operation
3- Need to inspect existing content3- Need to inspect existing content
• How many records are available?
• For what species is there any data?
• For what countries/regions is there any data?
Search operationSearch operation
4- Need to search content4- Need to search content
• What records satisfy these parameters or filter conditions?
• Networks are free to define their own response types.
• Responses can be paged.
Ping operationPing operation
5- Need to monitor providers5- Need to monitor providers
• Is the service ready to receive requests?
Other protocolsOther protocols
• WFS (Open Geospatial Web Feature Service)
• Spatial queries
• Additional operations: update/insert/remove/lock feature
• TDWG Geospatial Interest Group
Other protocolsOther protocols
• OAI-PMH (Open Archives Initiative – Protocol for Metadata Harvesting)
• Data harvesting = gathering of data from several sources to store into a single database.
• TAPIR itself can be used directly for this purpose, including incremental harvesting (depends only on the abstraction layer).
• OAI-PMH service can be set up on top of a TAPIR service.
TAPIR - current statusTAPIR - current status
• Provider software available.• ~75 providers registered in GBIF's UDDI (> 100).• Client libraries available.• Online service validator.• Online tools to help building TAPIR documents.• Documentation:
– protocol specification
– network guide
– executive summary• TDWG resources: mailing list, Wiki, Subversion, etc.• Final version of the specification should be submitted this year
to the TDWG standards track (minor changes still being considered before submitting).
Thank youThank you
renato (at) cria . org . br
Recommended