An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Preview:

Citation preview

An introduction to data exchange An introduction to data exchange protocols in TDWGprotocols in TDWG

Renato De GiovanniTDWG 2008

Overview of the presentationOverview of the presentation

History and context: When and how protocols started to be discussed in TDWG

The basic idea behind distributed queries

Main features of TAPIR

Other protocols

Current status of TAPIR

TDWG StandardsTDWG Standards

Historically, TDWG has concentrated efforts in:

creating controlled vocabularies, indexes, guidelines and best practices (1985 until ~2000):

– Index Herbariorum, Authors of plant names, Floristic regions of the world, etc.

creating standards to represent different types of biodiversity data (2000 until today):

– SDD (descriptions), ABCD (specimens), TCS (names and concepts). More being created.

First networks in our communityFirst networks in our community

REMIBREMIB

ENHSINENHSINENHSINENHSIN

Z39.50

custom

HISPID

custom

protocol:

More networks followed...

Australia’s Virtual Herbarium (1999)

• Included a data abstraction layer

(HISPID) and a simple protocol to

return records.

• HISPID became a TDWG standard.

• This approach was only used by the

AVH.

The Species Analyst (1999)

• Z39.50 was created and maintained by

the Library of Congress.

• Pre-Web technology (no HTTP).

• Protocol is bound to data abstraction

layer.

• Limited support to XML and Unicode.

Z39.50

DiGIR

BioCASe

TAPIR

MaNIS, speciesLink, OBIS... (2002)

• DiGIR was funded by a NSF project.

• Motivation was to replace Z39.50 with

a new a protocol without the Z39.50

limitations and then split TSA into

multiple thematic networks.

BioCASE Network (2003)

• Created after many unsuccessful

attempts to reach an agreement with

the DiGIR community.

• Can be used with more complex data

abstraction layers like ABCD.

TAPIR protocol (2004)

• Initial study contracted by GBIF to

eliminate interoperability problems

and duplication of efforts.

• TDWG was the venue for discussions

(currently an official task group).

Protocols, Networks & TDWGProtocols, Networks & TDWG

HISPID

Windows

GNU/LinuxMac OS X

MS Access

PostgreSQLMySQL

http://...

http://...http://...

protocol + data abstraction layer

e.g. DarwinCore

Client

Main scenario: Distributed queriesMain scenario: Distributed queries

provider 1provider 2 provider 3

otherproviders

Reasons for the existence of TAPIRReasons for the existence of TAPIR

• TAPIR can potentially be used to exchange data encoded in most (if not all) XML standards defined by the other TDWG groups. TAPIR is one of the main components of the new TDWG Architecture.

• When integrating DiGIR and BioCASe, the other existing protocol alternatives were not considered suitable.

• Changing the existing DiGIR and BioCASe networks to use a completely different protocol would cause major impacts in existing tools. TAPIR keeps many similarities with DiGIR and BioCASe to avoid such impacts.

Main features of TAPIRMain features of TAPIR

• Uses the Web (HTTP) to communicate with providers.

• Responses are always structured in XML.

• Can be used with different data abstraction layers.

• Can return different types of search responses.

• Tries to address the basic needs of federated networks through 5 operations.

Metadata operationMetadata operation(default)(default)

1- Need to identify providers and get basic information 1- Need to identify providers and get basic information about the serviceabout the service

• Who is responsible for the service?

• How can the owner be contacted?

• What kind of data is being served?

• In which language is the data?

• Are there any IPR restrictions?

Capabilities operationCapabilities operation

2- Need to get technical information about the service2- Need to get technical information about the service

• What data abstraction layer is being used?

• What operations are available?

• Does the provider only understand specific query templates? (which ones?)

• Does the provider support custom (on-the-fly) filters?

• Does the provider support custom (on-the-fly) response types?

Inventory operationInventory operation

3- Need to inspect existing content3- Need to inspect existing content

• How many records are available?

• For what species is there any data?

• For what countries/regions is there any data?

Search operationSearch operation

4- Need to search content4- Need to search content

• What records satisfy these parameters or filter conditions?

• Networks are free to define their own response types.

• Responses can be paged.

Ping operationPing operation

5- Need to monitor providers5- Need to monitor providers

• Is the service ready to receive requests?

Other protocolsOther protocols

• WFS (Open Geospatial Web Feature Service)

• Spatial queries

• Additional operations: update/insert/remove/lock feature

• TDWG Geospatial Interest Group

Other protocolsOther protocols

• OAI-PMH (Open Archives Initiative – Protocol for Metadata Harvesting)

• Data harvesting = gathering of data from several sources to store into a single database.

• TAPIR itself can be used directly for this purpose, including incremental harvesting (depends only on the abstraction layer).

• OAI-PMH service can be set up on top of a TAPIR service.

TAPIR - current statusTAPIR - current status

• Provider software available.• ~75 providers registered in GBIF's UDDI (> 100).• Client libraries available.• Online service validator.• Online tools to help building TAPIR documents.• Documentation:

– protocol specification

– network guide

– executive summary• TDWG resources: mailing list, Wiki, Subversion, etc.• Final version of the specification should be submitted this year

to the TDWG standards track (minor changes still being considered before submitting).

Thank youThank you

renato (at) cria . org . br

Recommended