David Schindel - Barcode Data Standard Compliance

Preview:

DESCRIPTION

Vouchering and archiving of vouchers, imaging and archival of e-vouchers, provenance data quality and sequence and trace file quality

Citation preview

The BARCODE Data Standard

David E. Schindel, Executive SecretaryNational Museum of Natural History

Smithsonian InstitutionSchindelD@si.edu; http://www.barcoding.si.edu

202/633-0812; fax 202/633-2938

BARCODE Data Standard is:

A set of required elements for a reserved Keyword (‘BARCODE’) in GenBank

A set of sequence quality requirements

Required or recommneded formats for data interoperability with:– Voucher specimens in biorepositories– Georeferenced data– Taxonomic literature

An Internal ID System for All Animals

Typical Animal Cell

Mitochondrion

DNA

mtDNA

D-Loop

ND5

H-strand

ND4

ND4L

ND3COIII

L-strand

ND6

ND2

ND1

COII

Small ribosomal RNA

ATPase subunit 8

ATPase subunit 6

Cytochrome b

COICOI

The Mitochondrial Genome

Non-COI regions for other taxaLand plants:– Chloroplast matK and rbcL approved Nov 09

– 70-75% resolving ability, higher in angiosperms– Non-coding plastid and nuclear regions being

explored

Fungi:– CBOL Working Group met this week in Amsterdam– Agreed to recommend ITS; 72% effective

Protists:– CBOL Working Group July meeting, Berlin

USER

/GenBank

Key

Mirroring

Update Channel

Private Records

BARCODE Record Flow Chart

BARCODE Records in GenBank

Submission of BARCODE Records to EBI and DDBJ

Required Elements for BARCODE

Taxonomic identification to species

Voucher specimen ID in standard format

Name of barcode region

Length, quality, 2 trace files

Forward/reverse primer sequences, names

Country/Ocean/Sea of origin

Highly Recommended Elements

Latitude/longitude

Name of Collector

Collection date

Name of identifier

Traditional Taxonomy

GSC Minimum Standards

(MI*)

Traditional GenBank

Voucher specimen ID XXX XXXSpecies ID XXX X X

Identified by XXXDNA sequence XXX XXXGene region XXXGeographic origin (country, ocean) XXX XLatitude/Longitude XXX XXX

Collection date, collector name XXX XXX

Trace files XXX XXPrimer information X XX

Barcode Sequence

Voucher Specimen

Species Name

Specimen Metadata

Literature citation

BARCODE Records in INSDC

Indices - Catalogue of Life - GBIF/ECAT

Nomenclators - Zoo Record - IPNI - NameBank

Publication links - New species

GeoreferenceHabitat

Character setsImages

BehaviorOther genes

Trace files Primers

Databases - Provisional sp.

Record in BOLD

Compliance with Standard (1)1.37 million records in BOLD

514,390 BARCODE records in INSDC

395,774 have ordinal name plus Barcode Index Number for taxonomic ID– Rapid data release versus time for annotation– Exposure to data theft, risk of misidentification– Added value of Linnean name– Incidence of misidentifications in GenBank– Danger of circular reasoning

Taxonomic Identification

The genus and species combination that can be found in:– a taxonomic index such as Catalog of Life,

Zoological Record or IPNI;– a taxonomic treatment of a previously

published species name; or– a published description of the species; or

A provisional label for a potential new species;

Rod Page’s ‘Dark Taxa’

R. Page, iPhylo blogspot, 12 April 2011

Taxonomic Content in iBOL Data

iBOL ‘Phase 1’

Org name: Order + BIN

Tentative Name: blank

GenBank ‘Phase 0’

Tentative name is in BOLD, unreleased

iBOL ‘Phase 2’

Org name: Order + BIN

Tentative Name: blank

GenBank ‘Phase 1’

Org name = Order + BIN plus

Tentative name

GenBank ‘Phase 2’

Org name = sp. name

Unique identifier for the voucher specimen

In standardized format based on Darwin Core:

Institutional acronym:Collection code:Specimen number

Institutional acronym:Specimen number

personal:Collection code:Specimen number

GTI/CBOL/iBOL Workshop, 7 November 2009

Compliance with Standard (2)514,390 BARCODE records in INSDC– Traces, primers, length, country, and presence

of voucherID checked by GenBank

99.9% have entry for /specimen_voucher

13,151 have formatted voucher from 38 institutions– 20 confirmed in biorepositories– 11 unconfirmed– 7 unlisted

Darwin Core TripletStructured Link to Vouchers

Institutional Acronym

Collection Code

Catalog ID

: :

NHM LEP 123456: :

personal DHJanzen SRNP12345: :

AMNHIcelandic Institute of Natural History, Akureyri Division Akureyri Iceland

AMNH American Museum of Natural History New York USA

UNL Universidad Autónoma de Nuevo León Monterrey, Nuevo León Mexico

UNL University of Nebraska State Museum Lincoln, Nebraska USA

UNLCentro de Estratigrafia e Paleobiologia da Universidade Nova de Lisboa Monte de Caparica Portugal

ZMK Zoological Musem, Kristiania Oslo Norway

ZMK Zoologisches Museum der Universität Kiel Kiel Germany

ZMK Zoological Museum, Copenhagen Copenhagen Denmark

CBOL/GBIF/NCBI Registry of Biorepositories

www.biorepositories.org

Recommended