61
Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft [email protected]

Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft [email protected]

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Biodiversity Informatics

Biodiversity informatics and the manipulation of biological information

Jim Croft

[email protected]

Page 2: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Outline

• ‘Biodiversity Informatics’

• Australia’s Virtual Herbarium as a model of use and management of biodiversity knowledge

• New ways of managing biological knowledge

• Information management issues

• Current trends and future directions in biodiversity knowledge management

Page 3: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Biodiversity Informatics

Management of our knowledge of biodiversity using modern techniques of data and information management

Page 4: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Taxonomy of Database Interoperability

Multi-database systems

Non-federated Federated

Loosely coupled Tightly coupled

Multiple schemas Unified schema

Sheth & Larson (1990)

[Autonomous]

Page 5: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Tightly Coupled

• Central administration

• Semantic consistency– Schemas– Authority files

• Common technology

• Difficult to implement

• Proprietary solutions tolerated

• Expensive

Page 6: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Loosely Coupled

• Closer to Reality

• Independent management

• Suited to scientific systems

• Common publication syntax– Export schema

• Less functionality … Doable

• Need open standards

Page 7: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Intermediate Coupling

• Scientific Independence

• Common syntax & semantics for the exchange of information.– Import/export– HISPID, Darwin Core, TDWG/CODATA abcd

• Leverage Existing Open Standards– Participation in wider, more loosely coupled

federations– Simplicity– Distribution of effort

Page 8: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Data Refinement

datadata

informationinformation

knowledgeknowledge

actionaction

Increasing refinement & utility of data

the real worldthe real world

observationsobservations

Envir. decision making• conservation• restoration biology• resource mgmt• utilization

Policy & strategy• government• corporate• individual

Page 9: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Herbarium Specimens

Page 10: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Specimen Data Capture

Page 11: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

– Scientific name

– Collection date

– Collector name & number

– Location

– Soils

– Habitat (incl. topography)

– Vegetation community

– Associated species

Specimen Data

• The core information is from herbarium specimens

• Beyond taxonomy & names

• Collections data:

Page 12: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

A Herbarium Database Structure

Page 13: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au
Page 14: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au
Page 15: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

What do we want to know?

• What species does a plant belong to?

• What is its name?

• What other species is it related to?

• What does it look like?

• Where does it grow?

• Where might it grow?

• What other species grow with it?

• What species grow in a defined area?

• How did they get there?

Page 16: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

What is a Virtual Herbarium?

An on-line digital representation of a scientific collection of preserved plant specimens and botanical information

Page 17: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

What is the AVH?

• Spread across Australian herbaria

• Data distributed; resides with custodians

• Each herbarium has a portal to receive requests and to deliver data

• A common single query AVH interface in each herbarium polls all herbaria

Major Australian Herbaria

Page 18: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

AVH Partners

State Herbarium of South Australia

Queensland Herbarium

Australian National Herbarium

Northern Territory Herbarium

Tasmanian Herbarium

Industry Partner:KE Software

National Herbarium of Victoria

National Herbarium of New South Wales

Western Australian Herbarium

Australian Biological Resources Study

Page 19: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Why is there an AVH?

• Pressure on Herbaria to work more efficiently

• Demand for access to larger amounts of data

• Demand to access data more quickly

• Demand to view data in different ways

• Pressure on herbaria to appear and to be more responsive to community needs

Page 20: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

• > 18,000 species of higher plants• > 64,000 available names• Extensive synonymy (4 names per plant)

• 8 major government-funded herbaria• Similar number of university herbaria

• > 6,500,000 specimens in Aust. herbaria• 50 -100 data elements per specimen• Several Kb per specimen (excl. images)

What is the AVH task?

Page 21: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Herbarium database status

Page 22: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

• $10M over 5 years to database all major Australian herbarium collections

• $10 million: - $ 4 million Commonwealth

- $ 4 million State/Territory- $ 2 million private

• Initial focus on capture of herbarium specimen data

• Ultimate aim a complete flora information system

The AVH Agreement

Page 23: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Australia’s Virtual Herbarium

On-line access to herbarium specimen information and botanical knowledge

Page 24: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Australian Plant Name Australian Plant Name Index (APNI)Index (APNI)

Page 25: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

www.anbg.gov.au/apni

Page 26: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

www.anbg.gov.au/win

Page 27: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

http://www.chah.gov.au/avh.html

Page 28: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Acaciasalicina

Page 29: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au
Page 30: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au
Page 31: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au
Page 32: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au
Page 33: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au
Page 34: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Inc

urv

ed

Inc

urv

ed

Re

cu

rve

d

Research Potential:Plant distribution analysis

?Incurved Recurved

Pultenaea distribution classes in eastern Australia

?

Page 35: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

• On-line systems

• Often regionally based

• Integrating:

– Plant names and synonyms– Descriptive Flora treatments– Illustrations– Distributions– etc.

Flora Information Systems

Page 36: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Flora Information Systems

Page 37: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Botanical illustrations

Page 38: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Search all records on-line

Digital images available (‘best of class’)

35,000 images of Australian plants and vegetation

National Plant Photograph Index

www.anbg.gov.au/anbg/photo-collection/

Page 39: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

High resolution image oftype specimen of Austrobaileyadownloaded over the Internetfrom the Herbarium of theNew York Botanical Garden

Type Images on demand

Page 40: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Flora & Revision Databases

New ways of managing and delivering botanical information

Page 41: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

A Flora in XMLExample in HTML<p><b>Platyzoma microphyllum</b> R.Br., <i>Prodr.</i> 160 (1810)</p><p ><i>Gleichenia platyzoma</i> F.Muell., <i>Veg. Chatham.-Isl.</i> 63 (1864). T: Facing Island, Qld, <i>R.Brown Iter Austral. 102</i> ; lecto: BM.</p><p>Illus.: S.B.Andrews…</p><p>Rhizome short-creeping… Sporangia in zones in distal half of frond. Fig. 55</p><p>Widespread across northern Australia… Grows in sandy or swampy soils.... Map 135.</p><p>W.A.: 14.4 km NW of Mt…</p>

Example in XML<taxon><name>Platyzoma microphyllum</name> <author>R.Br</author>, <publication><title>Prodr.</title> <page>160</page><date>1810</date> </publication><synonym> <name>Gleichenia platyzoma</name> <author> F.Muell. </author><publication>Veg. Chatham.-Isl.</publication> <page>63<page> <date>1864</date> <type>T: Facing Island, Qld, …</type></synonym><illustration>Illus.: S.B.Andrews…</illustration><description>Rhizome short-creeping… Sporangia in zones in distal half of frond. </description> <figure> Fig. 55 </figure><locality>Widespread across northern Australia… </locality><habitat>Grows in sandy or swampy soils...</habitat> <map>Map 135.</map><specimens>W.A.: 14.4 km NW of Mt…</specimens></taxon>

Page 42: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

A Flora XML Schema fragment

Page 43: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

A Flora database structure

Page 44: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

A Flora database report

Page 45: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

W-P file

Editors W-P file

Botanist

Publisher C-R Copy

Book, etc.

An old process of publication

Page 46: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

W-P file

Editors W-P file

Botanist

Publisher C-R Copy

Book, etc.

An new process of publication

XML file

Database XML fileOutputs

Outputs

Page 47: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Editors

Botanist

Publisher C-R Copy

Book, etc.

A future process of publication

XML file

DatabaseOutputs

Database

Outputs

Page 48: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Interactive Identification

Using computers to identify and name plant species and display information

about them

Page 49: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Interactive Plant Identification

Page 50: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au
Page 51: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Current trends, future directions

?

Page 52: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Trends in Biodiverssity Information Management

NomenclaturalRegionalText-basedTaxon-basedIndividual effortSingle userStandaloneCentralizedProprietary SystemIdiosyncratic DesignNonstandard data contentConventionalDevelopmentalAccess charges

Taxonomic Global Image-based Spatially-based Partnerships Multiuser Networked Distributed Open System Standard Architecture Standard data content Innovative Stable Freely available

Page 53: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Global Organization

• Several parallel and complementary initiatives:

– Global Biodiversity Information Facility (GIF)

– Taxonomic Databases Working Group (TDWG)

– Global Taxonomic Initiative (GTI)

– International Organization for Plant Information (IOPI)

– Species 2000

– All Species Foundation (ALL)

Page 54: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

www.gbif.org

Page 55: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Data Flow within GBIF Network

Service Metadata

Collection Node Collection Nodes

GBIF PortalParticipant

Node

Service Metadata

Participant Node

Service Metadata

Service Metadata

Specimen Index Data

Detailed Specimen

Data

Aggregated Data

Detailed Specimen Data

Aggregated Data

User Browser

HTML Data HTML Data

Page 56: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

www.all-species.org

Page 57: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

0

5000000

10000000

15000000

20000000

Yearwww.all-species.org

Page 58: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

0

5000000

10000000

15000000

20000000

Year

What needs to happen here?

www.all-species.org

Page 59: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au
Page 60: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

Requirements for Interoperability

Standards…

Page 61: Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft jrc@anbg.gov.au

URL

UMLabcd

URI XHTML

HTTPUDDI

XSLTXPATHRDF

PNG

SVG

DOMCSS SAXHISPID

ITFBNF

Z39.50

WAIS

ASN.1

XML schema

Standards for Interoperability of

Biodiversity Databases

Dublin CoreRDFSZ39.19 SOAP

cgi

RMIDARWIN CORE

WSDL