21
Mastertitelformat bearbeiten Plankton*Net: Using Fedora to aggregate and disseminate biodiversity content Ana Macario Alfred Wegener Intitute for Polar and Marine Research Computer Center

Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

Embed Size (px)

Citation preview

Page 1: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-1-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeitenPlankton*Net: Using Fedora to aggregate and disseminate

biodiversity content

Phot

o: L

. Tad

day

Ana Macario

Alfred Wegener Intitute forPolar and Marine Research

Computer Center

Page 2: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-2-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Road map

1. EU-project: Plankton*Net 2. Introduction to taxonomy3. Existing information systems4. Content model for biodiversity

Page 3: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-3-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Background

-> 2 year EU project acronym “Plankton-Net” with 7 partners: AWI, MBL, Roscoff, Caen, Universidade de Lisboa, IPIMAR (Lisbon), Natural History Museum

-> Original scope: to create a network of interoperable repositories on plankton taxonomy

-> Motivation: to give taxonomists support in the hard task of identifying species and to rescue historically relevant collections

-> Scope keeps growing… information system which aggregatestaxonomic content and associated images, environmentaldata, digitized documents, taxon descriptions, moleculardata, etc

Early 2004, AWI started a small project with MBL to archive images and taxonomic keys/descriptions for phytoplankton found in the North Sea …

Page 4: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-4-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Road map

1. EU-project: Plankton*Net 2. Introduction to taxonomy3. Existing information systems4. Content model for biodiversity

Page 5: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-5-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Taxonomy and its challenges

Information about organisms isoften linked to a name. Thiscan create problems in information retrieval…

■ one taxon can have manynames

■ the same name can referto many taxa

Page 6: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-6-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Taxonomic Name Server

The uBio Taxonomic Name Server (MBL-WHOI Library, Woods Hole, USA), implemented as a web service, acts as a name thesaurus. Twoservices are offered:

NameBank is a repository of millions of recordedbiological names and facts that link thosenames together

ClassificationBank stores multiple classificationsand taxonomic concepts that are the result of expert opinions. It extends the functionality of NameBank.

Page 7: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-7-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

nameBank

Alternative names Vernacular names More or less specific

Scientific names evolve over time as specimen‘s names are updated over theyears. When dealing with vernacular (common) name, the problem is evenmore difficult given the fact that it may appear in several languages

What‘s in a name?

Page 8: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-8-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

What‘s in a classification?

ClassificationBank is a taxon concept server

Page 9: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-9-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Road map

1. EU-project: Plankton*Net 2. Introduction to taxonomy3. Existing information systems4. Content model for biodiversity

Page 10: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-10-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Biopedia

Page 11: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-11-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Rich information environment

WDC/Pangaea: Watertemperature and salinity, nutrients, lipidbiomarkersstratigraphy

Plankton*net@AWI

Digitalization of biodiversity-relatedliterature

Molecular dataDescription

Plankton*net@Roscoff

Persistent ID (PID)

Dublin Core (DC)

Datastream

Datastream

Audit Trail (AUDIT)

Relations (RELS-EXT)

Disseminator

Default Disseminator

Taxonomicnaming&classification

Page 12: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-12-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Road map

1. EU-project: Plankton*Net 2. Introduction to taxonomy3. Existing information systems4. Content model for biodiversity

Page 13: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-13-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

What is the content?

Object: Taxon [metadata, resources (images, digitized documents), aggregators]

Local or surrogate content datastreams primary to the object■ taxomics keys, synonyms and classification■ Darwin Core metadata■ Images, SEM photos, schematic drawings, etc■ Descriptions, morphometric data■ Digitized literature■ Geo-referenced environmental dataset■ Molecular data

Page 14: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-14-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Plankton-Net object - PID

All objects are taxons for which at least 1 image (datastream) is available.

Life Science Index IDs will be used to construct PIDs:

Example:

Piper nigrum L. will be tagged as: info:fedora/plankton-net.org:447505

Persistent ID (PID)

Dublin Core (DC)

Datastream

Datastream

Audit Trail (AUDIT)

Relations (RELS-EXT)

Disseminator

Default Disseminator

Page 15: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-15-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Plankton-Net object – datastream

DC (text/xml)uBio naming and classification bank (text/xml)

Darwin Core (text/xml)

Biopedia descriptions ( text/xml)

Relationships (RDF/xml)

Images (image/jpeg) and respective annotationsDocuments (application/pdf)Environmental data (text/tab-separated)Genomics (text/xml)

Persistent ID (PID)

Dublin Core (DC)

Datastream

Datastream

Audit Trail (AUDIT)

Relations (RELS-EXT)

Disseminator

Default Disseminator

Page 16: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-16-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Datastreams type image

Datastreambla bla bla

<info:fedora/plankton-net.org:image:21

bla bla blabla bla bla<info:fedora/plankton-

net.org:image:45bla bla blabla bla bla

<www.roscoff.fr?uid=234

In order to assure great flexibility in the re-use of objects anddiscovery of content in different contexts…

Datastream

DatastreamDatastream

Datastream

Datastream

Datastream

image:21

image:45

external referenced image

info:fedora/plankton-net.org:447505

3 images associatedwith taxon

Page 17: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-17-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Datastreams type documents

Datastreambla bla bla

<info:fedora/awi.de/epic/doc:198>

bla bla blabla bla bla<www.heritage.org?ui

d=234>

In order to assure great flexibility in the re-use of objects anddiscovery of content in different contexts…

Datastream

Datastream

1. Datastream

Datastream

doc:198

external referenced doc

Page 18: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-18-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Plankton-Net object – disseminators

Default (getPreview, getFullView, getCitation, getBiopediaDescription, getDC, getDarwinCore, getUBioNaming, getUBioClassification)

Image transformations

Metadata-crosswalks (saxon XSLT engineservices)

Mapping services

Persistent ID (PID)

Dublin Core (DC)

Datastream

Datastream

Audit Trail (AUDIT)

Relations (RELS-EXT)

Disseminator

Default Disseminator

Page 19: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-19-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Plankton-Net object – disseminators for images

Images

getMetadatagetThumbnail, getImagegetLabel, getAnnotation

getSeeAlsos

download

- > Interoperability with othercommunity methods in the behaviour

Persistent ID (PID)

Dublin Core (DC)

Datastream

Datastream

Audit Trail (AUDIT)

Relations (RELS-EXT)

Disseminator

Default Disseminator

Page 20: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-20-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Plankton-Net object – disseminators

Metadata-crosswalks

getDarwinCore2,getOBIS, getGBIF, getMARBEF, etc

Persistent ID (PID)

Dublin Core (DC)

Datastream

Datastream

Audit Trail (AUDIT)

Relations (RELS-EXT)

Disseminator

Default Disseminator

Page 21: Mastertitelformat bearbeiten Plankton*Net: Using Fedora to ... · Plankton-Net object : Relationships, audit control and more Persistent ID (PID) Dublin Core (DC) Datastream Datastream

-21-Ana Macario

Content Model for Biodiversity, 2006-05-04

Mastertitelformat bearbeiten

Plankton-Net object : Relationships, audit control and more

Persistent ID (PID)

Dublin Core (DC)

Datastream

Datastream

Audit Trail (AUDIT)

Relations (RELS-EXT)

Disseminator

Default Disseminator

Relationship metadata• collection: isMemberofCollection,

isMemberofAsset, isAssociatedWith(„SEE ALSO“)

• branding-related: AggregatedBy,DescribedBy, CertifiedBy

Versionning and traceability

Access control policies at collectionlevel