View
218
Download
1
Tags:
Embed Size (px)
Citation preview
Biodiversity Informatics
Biodiversity informatics and the manipulation of biological information
Jim Croft
Outline
• ‘Biodiversity Informatics’
• Australia’s Virtual Herbarium as a model of use and management of biodiversity knowledge
• New ways of managing biological knowledge
• Information management issues
• Current trends and future directions in biodiversity knowledge management
Biodiversity Informatics
Management of our knowledge of biodiversity using modern techniques of data and information management
Taxonomy of Database Interoperability
Multi-database systems
Non-federated Federated
Loosely coupled Tightly coupled
Multiple schemas Unified schema
Sheth & Larson (1990)
[Autonomous]
Tightly Coupled
• Central administration
• Semantic consistency– Schemas– Authority files
• Common technology
• Difficult to implement
• Proprietary solutions tolerated
• Expensive
Loosely Coupled
• Closer to Reality
• Independent management
• Suited to scientific systems
• Common publication syntax– Export schema
• Less functionality … Doable
• Need open standards
Intermediate Coupling
• Scientific Independence
• Common syntax & semantics for the exchange of information.– Import/export– HISPID, Darwin Core, TDWG/CODATA abcd
• Leverage Existing Open Standards– Participation in wider, more loosely coupled
federations– Simplicity– Distribution of effort
Data Refinement
datadata
informationinformation
knowledgeknowledge
actionaction
Increasing refinement & utility of data
the real worldthe real world
observationsobservations
Envir. decision making• conservation• restoration biology• resource mgmt• utilization
Policy & strategy• government• corporate• individual
Herbarium Specimens
Specimen Data Capture
– Scientific name
– Collection date
– Collector name & number
– Location
– Soils
– Habitat (incl. topography)
– Vegetation community
– Associated species
Specimen Data
• The core information is from herbarium specimens
• Beyond taxonomy & names
• Collections data:
A Herbarium Database Structure
What do we want to know?
• What species does a plant belong to?
• What is its name?
• What other species is it related to?
• What does it look like?
• Where does it grow?
• Where might it grow?
• What other species grow with it?
• What species grow in a defined area?
• How did they get there?
What is a Virtual Herbarium?
An on-line digital representation of a scientific collection of preserved plant specimens and botanical information
What is the AVH?
• Spread across Australian herbaria
• Data distributed; resides with custodians
• Each herbarium has a portal to receive requests and to deliver data
• A common single query AVH interface in each herbarium polls all herbaria
Major Australian Herbaria
AVH Partners
State Herbarium of South Australia
Queensland Herbarium
Australian National Herbarium
Northern Territory Herbarium
Tasmanian Herbarium
Industry Partner:KE Software
National Herbarium of Victoria
National Herbarium of New South Wales
Western Australian Herbarium
Australian Biological Resources Study
Why is there an AVH?
• Pressure on Herbaria to work more efficiently
• Demand for access to larger amounts of data
• Demand to access data more quickly
• Demand to view data in different ways
• Pressure on herbaria to appear and to be more responsive to community needs
• > 18,000 species of higher plants• > 64,000 available names• Extensive synonymy (4 names per plant)
• 8 major government-funded herbaria• Similar number of university herbaria
• > 6,500,000 specimens in Aust. herbaria• 50 -100 data elements per specimen• Several Kb per specimen (excl. images)
What is the AVH task?
Herbarium database status
• $10M over 5 years to database all major Australian herbarium collections
• $10 million: - $ 4 million Commonwealth
- $ 4 million State/Territory- $ 2 million private
• Initial focus on capture of herbarium specimen data
• Ultimate aim a complete flora information system
The AVH Agreement
Australia’s Virtual Herbarium
On-line access to herbarium specimen information and botanical knowledge
Australian Plant Name Australian Plant Name Index (APNI)Index (APNI)
www.anbg.gov.au/apni
www.anbg.gov.au/win
http://www.chah.gov.au/avh.html
Acaciasalicina
Inc
urv
ed
Inc
urv
ed
Re
cu
rve
d
Research Potential:Plant distribution analysis
?Incurved Recurved
Pultenaea distribution classes in eastern Australia
?
• On-line systems
• Often regionally based
• Integrating:
– Plant names and synonyms– Descriptive Flora treatments– Illustrations– Distributions– etc.
Flora Information Systems
Flora Information Systems
Botanical illustrations
Search all records on-line
Digital images available (‘best of class’)
35,000 images of Australian plants and vegetation
National Plant Photograph Index
www.anbg.gov.au/anbg/photo-collection/
High resolution image oftype specimen of Austrobaileyadownloaded over the Internetfrom the Herbarium of theNew York Botanical Garden
Type Images on demand
Flora & Revision Databases
New ways of managing and delivering botanical information
A Flora in XMLExample in HTML<p><b>Platyzoma microphyllum</b> R.Br., <i>Prodr.</i> 160 (1810)</p><p ><i>Gleichenia platyzoma</i> F.Muell., <i>Veg. Chatham.-Isl.</i> 63 (1864). T: Facing Island, Qld, <i>R.Brown Iter Austral. 102</i> ; lecto: BM.</p><p>Illus.: S.B.Andrews…</p><p>Rhizome short-creeping… Sporangia in zones in distal half of frond. Fig. 55</p><p>Widespread across northern Australia… Grows in sandy or swampy soils.... Map 135.</p><p>W.A.: 14.4 km NW of Mt…</p>
Example in XML<taxon><name>Platyzoma microphyllum</name> <author>R.Br</author>, <publication><title>Prodr.</title> <page>160</page><date>1810</date> </publication><synonym> <name>Gleichenia platyzoma</name> <author> F.Muell. </author><publication>Veg. Chatham.-Isl.</publication> <page>63<page> <date>1864</date> <type>T: Facing Island, Qld, …</type></synonym><illustration>Illus.: S.B.Andrews…</illustration><description>Rhizome short-creeping… Sporangia in zones in distal half of frond. </description> <figure> Fig. 55 </figure><locality>Widespread across northern Australia… </locality><habitat>Grows in sandy or swampy soils...</habitat> <map>Map 135.</map><specimens>W.A.: 14.4 km NW of Mt…</specimens></taxon>
A Flora XML Schema fragment
A Flora database structure
A Flora database report
W-P file
Editors W-P file
Botanist
Publisher C-R Copy
Book, etc.
An old process of publication
W-P file
Editors W-P file
Botanist
Publisher C-R Copy
Book, etc.
An new process of publication
XML file
Database XML fileOutputs
Outputs
Editors
Botanist
Publisher C-R Copy
Book, etc.
A future process of publication
XML file
DatabaseOutputs
Database
Outputs
Interactive Identification
Using computers to identify and name plant species and display information
about them
Interactive Plant Identification
Current trends, future directions
?
Trends in Biodiverssity Information Management
NomenclaturalRegionalText-basedTaxon-basedIndividual effortSingle userStandaloneCentralizedProprietary SystemIdiosyncratic DesignNonstandard data contentConventionalDevelopmentalAccess charges
Taxonomic Global Image-based Spatially-based Partnerships Multiuser Networked Distributed Open System Standard Architecture Standard data content Innovative Stable Freely available
Global Organization
• Several parallel and complementary initiatives:
– Global Biodiversity Information Facility (GIF)
– Taxonomic Databases Working Group (TDWG)
– Global Taxonomic Initiative (GTI)
– International Organization for Plant Information (IOPI)
– Species 2000
– All Species Foundation (ALL)
www.gbif.org
Data Flow within GBIF Network
Service Metadata
Collection Node Collection Nodes
GBIF PortalParticipant
Node
Service Metadata
Participant Node
Service Metadata
Service Metadata
Specimen Index Data
Detailed Specimen
Data
Aggregated Data
Detailed Specimen Data
Aggregated Data
User Browser
HTML Data HTML Data
www.all-species.org
0
5000000
10000000
15000000
20000000
Yearwww.all-species.org
0
5000000
10000000
15000000
20000000
Year
What needs to happen here?
www.all-species.org
Requirements for Interoperability
Standards…
URL
UMLabcd
URI XHTML
HTTPUDDI
XSLTXPATHRDF
PNG
SVG
DOMCSS SAXHISPID
ITFBNF
Z39.50
WAIS
ASN.1
XML schema
Standards for Interoperability of
Biodiversity Databases
Dublin CoreRDFSZ39.19 SOAP
cgi
RMIDARWIN CORE
WSDL