Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim...

Preview:

Citation preview

Biodiversity Informatics

Biodiversity informatics and the manipulation of biological information

Jim Croft

jrc@anbg.gov.au

Outline

• ‘Biodiversity Informatics’

• Australia’s Virtual Herbarium as a model of use and management of biodiversity knowledge

• New ways of managing biological knowledge

• Information management issues

• Current trends and future directions in biodiversity knowledge management

Biodiversity Informatics

Management of our knowledge of biodiversity using modern techniques of data and information management

Taxonomy of Database Interoperability

Multi-database systems

Non-federated Federated

Loosely coupled Tightly coupled

Multiple schemas Unified schema

Sheth & Larson (1990)

[Autonomous]

Tightly Coupled

• Central administration

• Semantic consistency– Schemas– Authority files

• Common technology

• Difficult to implement

• Proprietary solutions tolerated

• Expensive

Loosely Coupled

• Closer to Reality

• Independent management

• Suited to scientific systems

• Common publication syntax– Export schema

• Less functionality … Doable

• Need open standards

Intermediate Coupling

• Scientific Independence

• Common syntax & semantics for the exchange of information.– Import/export– HISPID, Darwin Core, TDWG/CODATA abcd

• Leverage Existing Open Standards– Participation in wider, more loosely coupled

federations– Simplicity– Distribution of effort

Data Refinement

datadata

informationinformation

knowledgeknowledge

actionaction

Increasing refinement & utility of data

the real worldthe real world

observationsobservations

Envir. decision making• conservation• restoration biology• resource mgmt• utilization

Policy & strategy• government• corporate• individual

Herbarium Specimens

Specimen Data Capture

– Scientific name

– Collection date

– Collector name & number

– Location

– Soils

– Habitat (incl. topography)

– Vegetation community

– Associated species

Specimen Data

• The core information is from herbarium specimens

• Beyond taxonomy & names

• Collections data:

A Herbarium Database Structure

What do we want to know?

• What species does a plant belong to?

• What is its name?

• What other species is it related to?

• What does it look like?

• Where does it grow?

• Where might it grow?

• What other species grow with it?

• What species grow in a defined area?

• How did they get there?

What is a Virtual Herbarium?

An on-line digital representation of a scientific collection of preserved plant specimens and botanical information

What is the AVH?

• Spread across Australian herbaria

• Data distributed; resides with custodians

• Each herbarium has a portal to receive requests and to deliver data

• A common single query AVH interface in each herbarium polls all herbaria

Major Australian Herbaria

AVH Partners

State Herbarium of South Australia

Queensland Herbarium

Australian National Herbarium

Northern Territory Herbarium

Tasmanian Herbarium

Industry Partner:KE Software

National Herbarium of Victoria

National Herbarium of New South Wales

Western Australian Herbarium

Australian Biological Resources Study

Why is there an AVH?

• Pressure on Herbaria to work more efficiently

• Demand for access to larger amounts of data

• Demand to access data more quickly

• Demand to view data in different ways

• Pressure on herbaria to appear and to be more responsive to community needs

• > 18,000 species of higher plants• > 64,000 available names• Extensive synonymy (4 names per plant)

• 8 major government-funded herbaria• Similar number of university herbaria

• > 6,500,000 specimens in Aust. herbaria• 50 -100 data elements per specimen• Several Kb per specimen (excl. images)

What is the AVH task?

Herbarium database status

• $10M over 5 years to database all major Australian herbarium collections

• $10 million: - $ 4 million Commonwealth

- $ 4 million State/Territory- $ 2 million private

• Initial focus on capture of herbarium specimen data

• Ultimate aim a complete flora information system

The AVH Agreement

Australia’s Virtual Herbarium

On-line access to herbarium specimen information and botanical knowledge

Australian Plant Name Australian Plant Name Index (APNI)Index (APNI)

www.anbg.gov.au/apni

www.anbg.gov.au/win

http://www.chah.gov.au/avh.html

Acaciasalicina

Inc

urv

ed

Inc

urv

ed

Re

cu

rve

d

Research Potential:Plant distribution analysis

?Incurved Recurved

Pultenaea distribution classes in eastern Australia

?

• On-line systems

• Often regionally based

• Integrating:

– Plant names and synonyms– Descriptive Flora treatments– Illustrations– Distributions– etc.

Flora Information Systems

Flora Information Systems

Botanical illustrations

Search all records on-line

Digital images available (‘best of class’)

35,000 images of Australian plants and vegetation

National Plant Photograph Index

www.anbg.gov.au/anbg/photo-collection/

High resolution image oftype specimen of Austrobaileyadownloaded over the Internetfrom the Herbarium of theNew York Botanical Garden

Type Images on demand

Flora & Revision Databases

New ways of managing and delivering botanical information

A Flora in XMLExample in HTML<p><b>Platyzoma microphyllum</b> R.Br., <i>Prodr.</i> 160 (1810)</p><p ><i>Gleichenia platyzoma</i> F.Muell., <i>Veg. Chatham.-Isl.</i> 63 (1864). T: Facing Island, Qld, <i>R.Brown Iter Austral. 102</i> ; lecto: BM.</p><p>Illus.: S.B.Andrews…</p><p>Rhizome short-creeping… Sporangia in zones in distal half of frond. Fig. 55</p><p>Widespread across northern Australia… Grows in sandy or swampy soils.... Map 135.</p><p>W.A.: 14.4 km NW of Mt…</p>

Example in XML<taxon><name>Platyzoma microphyllum</name> <author>R.Br</author>, <publication><title>Prodr.</title> <page>160</page><date>1810</date> </publication><synonym> <name>Gleichenia platyzoma</name> <author> F.Muell. </author><publication>Veg. Chatham.-Isl.</publication> <page>63<page> <date>1864</date> <type>T: Facing Island, Qld, …</type></synonym><illustration>Illus.: S.B.Andrews…</illustration><description>Rhizome short-creeping… Sporangia in zones in distal half of frond. </description> <figure> Fig. 55 </figure><locality>Widespread across northern Australia… </locality><habitat>Grows in sandy or swampy soils...</habitat> <map>Map 135.</map><specimens>W.A.: 14.4 km NW of Mt…</specimens></taxon>

A Flora XML Schema fragment

A Flora database structure

A Flora database report

W-P file

Editors W-P file

Botanist

Publisher C-R Copy

Book, etc.

An old process of publication

W-P file

Editors W-P file

Botanist

Publisher C-R Copy

Book, etc.

An new process of publication

XML file

Database XML fileOutputs

Outputs

Editors

Botanist

Publisher C-R Copy

Book, etc.

A future process of publication

XML file

DatabaseOutputs

Database

Outputs

Interactive Identification

Using computers to identify and name plant species and display information

about them

Interactive Plant Identification

Current trends, future directions

?

Trends in Biodiverssity Information Management

NomenclaturalRegionalText-basedTaxon-basedIndividual effortSingle userStandaloneCentralizedProprietary SystemIdiosyncratic DesignNonstandard data contentConventionalDevelopmentalAccess charges

Taxonomic Global Image-based Spatially-based Partnerships Multiuser Networked Distributed Open System Standard Architecture Standard data content Innovative Stable Freely available

Global Organization

• Several parallel and complementary initiatives:

– Global Biodiversity Information Facility (GIF)

– Taxonomic Databases Working Group (TDWG)

– Global Taxonomic Initiative (GTI)

– International Organization for Plant Information (IOPI)

– Species 2000

– All Species Foundation (ALL)

www.gbif.org

Data Flow within GBIF Network

Service Metadata

Collection Node Collection Nodes

GBIF PortalParticipant

Node

Service Metadata

Participant Node

Service Metadata

Service Metadata

Specimen Index Data

Detailed Specimen

Data

Aggregated Data

Detailed Specimen Data

Aggregated Data

User Browser

HTML Data HTML Data

www.all-species.org

0

5000000

10000000

15000000

20000000

Yearwww.all-species.org

0

5000000

10000000

15000000

20000000

Year

What needs to happen here?

www.all-species.org

Requirements for Interoperability

Standards…

URL

UMLabcd

URI XHTML

HTTPUDDI

XSLTXPATHRDF

PNG

SVG

DOMCSS SAXHISPID

ITFBNF

Z39.50

WAIS

ASN.1

XML schema

Standards for Interoperability of

Biodiversity Databases

Dublin CoreRDFSZ39.19 SOAP

cgi

RMIDARWIN CORE

WSDL

Recommended