17
An introduction to data An introduction to data exchange protocols in TDWG exchange protocols in TDWG Renato De Giovanni TDWG 2008

An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Embed Size (px)

Citation preview

Page 1: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

An introduction to data exchange An introduction to data exchange protocols in TDWGprotocols in TDWG

Renato De GiovanniTDWG 2008

Page 2: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Overview of the presentationOverview of the presentation

History and context: When and how protocols started to be discussed in TDWG

The basic idea behind distributed queries

Main features of TAPIR

Other protocols

Current status of TAPIR

Page 3: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

TDWG StandardsTDWG Standards

Historically, TDWG has concentrated efforts in:

creating controlled vocabularies, indexes, guidelines and best practices (1985 until ~2000):

– Index Herbariorum, Authors of plant names, Floristic regions of the world, etc.

creating standards to represent different types of biodiversity data (2000 until today):

– SDD (descriptions), ABCD (specimens), TCS (names and concepts). More being created.

Page 4: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

First networks in our communityFirst networks in our community

REMIBREMIB

ENHSINENHSINENHSINENHSIN

Z39.50

custom

HISPID

custom

protocol:

More networks followed...

Page 5: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Australia’s Virtual Herbarium (1999)

• Included a data abstraction layer

(HISPID) and a simple protocol to

return records.

• HISPID became a TDWG standard.

• This approach was only used by the

AVH.

The Species Analyst (1999)

• Z39.50 was created and maintained by

the Library of Congress.

• Pre-Web technology (no HTTP).

• Protocol is bound to data abstraction

layer.

• Limited support to XML and Unicode.

Z39.50

DiGIR

BioCASe

TAPIR

MaNIS, speciesLink, OBIS... (2002)

• DiGIR was funded by a NSF project.

• Motivation was to replace Z39.50 with

a new a protocol without the Z39.50

limitations and then split TSA into

multiple thematic networks.

BioCASE Network (2003)

• Created after many unsuccessful

attempts to reach an agreement with

the DiGIR community.

• Can be used with more complex data

abstraction layers like ABCD.

TAPIR protocol (2004)

• Initial study contracted by GBIF to

eliminate interoperability problems

and duplication of efforts.

• TDWG was the venue for discussions

(currently an official task group).

Protocols, Networks & TDWGProtocols, Networks & TDWG

HISPID

Page 6: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Windows

GNU/LinuxMac OS X

MS Access

PostgreSQLMySQL

http://...

http://...http://...

protocol + data abstraction layer

e.g. DarwinCore

Client

Main scenario: Distributed queriesMain scenario: Distributed queries

provider 1provider 2 provider 3

otherproviders

Page 7: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Reasons for the existence of TAPIRReasons for the existence of TAPIR

• TAPIR can potentially be used to exchange data encoded in most (if not all) XML standards defined by the other TDWG groups. TAPIR is one of the main components of the new TDWG Architecture.

• When integrating DiGIR and BioCASe, the other existing protocol alternatives were not considered suitable.

• Changing the existing DiGIR and BioCASe networks to use a completely different protocol would cause major impacts in existing tools. TAPIR keeps many similarities with DiGIR and BioCASe to avoid such impacts.

Page 8: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Main features of TAPIRMain features of TAPIR

• Uses the Web (HTTP) to communicate with providers.

• Responses are always structured in XML.

• Can be used with different data abstraction layers.

• Can return different types of search responses.

• Tries to address the basic needs of federated networks through 5 operations.

Page 9: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Metadata operationMetadata operation(default)(default)

1- Need to identify providers and get basic information 1- Need to identify providers and get basic information about the serviceabout the service

• Who is responsible for the service?

• How can the owner be contacted?

• What kind of data is being served?

• In which language is the data?

• Are there any IPR restrictions?

Page 10: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Capabilities operationCapabilities operation

2- Need to get technical information about the service2- Need to get technical information about the service

• What data abstraction layer is being used?

• What operations are available?

• Does the provider only understand specific query templates? (which ones?)

• Does the provider support custom (on-the-fly) filters?

• Does the provider support custom (on-the-fly) response types?

Page 11: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Inventory operationInventory operation

3- Need to inspect existing content3- Need to inspect existing content

• How many records are available?

• For what species is there any data?

• For what countries/regions is there any data?

Page 12: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Search operationSearch operation

4- Need to search content4- Need to search content

• What records satisfy these parameters or filter conditions?

• Networks are free to define their own response types.

• Responses can be paged.

Page 13: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Ping operationPing operation

5- Need to monitor providers5- Need to monitor providers

• Is the service ready to receive requests?

Page 14: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Other protocolsOther protocols

• WFS (Open Geospatial Web Feature Service)

• Spatial queries

• Additional operations: update/insert/remove/lock feature

• TDWG Geospatial Interest Group

Page 15: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Other protocolsOther protocols

• OAI-PMH (Open Archives Initiative – Protocol for Metadata Harvesting)

• Data harvesting = gathering of data from several sources to store into a single database.

• TAPIR itself can be used directly for this purpose, including incremental harvesting (depends only on the abstraction layer).

• OAI-PMH service can be set up on top of a TAPIR service.

Page 16: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

TAPIR - current statusTAPIR - current status

• Provider software available.• ~75 providers registered in GBIF's UDDI (> 100).• Client libraries available.• Online service validator.• Online tools to help building TAPIR documents.• Documentation:

– protocol specification

– network guide

– executive summary• TDWG resources: mailing list, Wiki, Subversion, etc.• Final version of the specification should be submitted this year

to the TDWG standards track (minor changes still being considered before submitting).

Page 17: An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008

Thank youThank you

renato (at) cria . org . br