21
The BARCODE Data Standard David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution [email protected] ; http://www.barcoding.si.edu 202/633-0812; fax 202/633-2938

David Schindel - Barcode Data Standard Compliance

Embed Size (px)

DESCRIPTION

Vouchering and archiving of vouchers, imaging and archival of e-vouchers, provenance data quality and sequence and trace file quality

Citation preview

Page 1: David Schindel - Barcode Data Standard Compliance

The BARCODE Data Standard

David E. Schindel, Executive SecretaryNational Museum of Natural History

Smithsonian [email protected]; http://www.barcoding.si.edu

202/633-0812; fax 202/633-2938

Page 2: David Schindel - Barcode Data Standard Compliance
Page 3: David Schindel - Barcode Data Standard Compliance

BARCODE Data Standard is:

A set of required elements for a reserved Keyword (‘BARCODE’) in GenBank

A set of sequence quality requirements

Required or recommneded formats for data interoperability with:– Voucher specimens in biorepositories– Georeferenced data– Taxonomic literature

Page 4: David Schindel - Barcode Data Standard Compliance

An Internal ID System for All Animals

Typical Animal Cell

Mitochondrion

DNA

mtDNA

D-Loop

ND5

H-strand

ND4

ND4L

ND3COIII

L-strand

ND6

ND2

ND1

COII

Small ribosomal RNA

ATPase subunit 8

ATPase subunit 6

Cytochrome b

COICOI

The Mitochondrial Genome

Page 5: David Schindel - Barcode Data Standard Compliance

Non-COI regions for other taxaLand plants:– Chloroplast matK and rbcL approved Nov 09

– 70-75% resolving ability, higher in angiosperms– Non-coding plastid and nuclear regions being

explored

Fungi:– CBOL Working Group met this week in Amsterdam– Agreed to recommend ITS; 72% effective

Protists:– CBOL Working Group July meeting, Berlin

Page 6: David Schindel - Barcode Data Standard Compliance

USER

/GenBank

Key

Mirroring

Update Channel

Private Records

BARCODE Record Flow Chart

Page 7: David Schindel - Barcode Data Standard Compliance

BARCODE Records in GenBank

Page 8: David Schindel - Barcode Data Standard Compliance

Submission of BARCODE Records to EBI and DDBJ

Page 9: David Schindel - Barcode Data Standard Compliance

Required Elements for BARCODE

Taxonomic identification to species

Voucher specimen ID in standard format

Name of barcode region

Length, quality, 2 trace files

Forward/reverse primer sequences, names

Country/Ocean/Sea of origin

Page 10: David Schindel - Barcode Data Standard Compliance

Highly Recommended Elements

Latitude/longitude

Name of Collector

Collection date

Name of identifier

Page 11: David Schindel - Barcode Data Standard Compliance

Traditional Taxonomy

GSC Minimum Standards

(MI*)

Traditional GenBank

Voucher specimen ID XXX XXXSpecies ID XXX X X

Identified by XXXDNA sequence XXX XXXGene region XXXGeographic origin (country, ocean) XXX XLatitude/Longitude XXX XXX

Collection date, collector name XXX XXX

Trace files XXX XXPrimer information X XX

Page 12: David Schindel - Barcode Data Standard Compliance

Barcode Sequence

Voucher Specimen

Species Name

Specimen Metadata

Literature citation

BARCODE Records in INSDC

Indices - Catalogue of Life - GBIF/ECAT

Nomenclators - Zoo Record - IPNI - NameBank

Publication links - New species

GeoreferenceHabitat

Character setsImages

BehaviorOther genes

Trace files Primers

Databases - Provisional sp.

Record in BOLD

Page 13: David Schindel - Barcode Data Standard Compliance

Compliance with Standard (1)1.37 million records in BOLD

514,390 BARCODE records in INSDC

395,774 have ordinal name plus Barcode Index Number for taxonomic ID– Rapid data release versus time for annotation– Exposure to data theft, risk of misidentification– Added value of Linnean name– Incidence of misidentifications in GenBank– Danger of circular reasoning

Page 14: David Schindel - Barcode Data Standard Compliance

Taxonomic Identification

The genus and species combination that can be found in:– a taxonomic index such as Catalog of Life,

Zoological Record or IPNI;– a taxonomic treatment of a previously

published species name; or– a published description of the species; or

A provisional label for a potential new species;

Page 15: David Schindel - Barcode Data Standard Compliance

Rod Page’s ‘Dark Taxa’

R. Page, iPhylo blogspot, 12 April 2011

Page 16: David Schindel - Barcode Data Standard Compliance

Taxonomic Content in iBOL Data

iBOL ‘Phase 1’

Org name: Order + BIN

Tentative Name: blank

GenBank ‘Phase 0’

Tentative name is in BOLD, unreleased

iBOL ‘Phase 2’

Org name: Order + BIN

Tentative Name: blank

GenBank ‘Phase 1’

Org name = Order + BIN plus

Tentative name

GenBank ‘Phase 2’

Org name = sp. name

Page 17: David Schindel - Barcode Data Standard Compliance

Unique identifier for the voucher specimen

In standardized format based on Darwin Core:

Institutional acronym:Collection code:Specimen number

Institutional acronym:Specimen number

personal:Collection code:Specimen number

GTI/CBOL/iBOL Workshop, 7 November 2009

Page 18: David Schindel - Barcode Data Standard Compliance

Compliance with Standard (2)514,390 BARCODE records in INSDC– Traces, primers, length, country, and presence

of voucherID checked by GenBank

99.9% have entry for /specimen_voucher

13,151 have formatted voucher from 38 institutions– 20 confirmed in biorepositories– 11 unconfirmed– 7 unlisted

Page 19: David Schindel - Barcode Data Standard Compliance

Darwin Core TripletStructured Link to Vouchers

Institutional Acronym

Collection Code

Catalog ID

: :

NHM LEP 123456: :

personal DHJanzen SRNP12345: :

Page 20: David Schindel - Barcode Data Standard Compliance

AMNHIcelandic Institute of Natural History, Akureyri Division Akureyri Iceland

AMNH American Museum of Natural History New York USA

UNL Universidad Autónoma de Nuevo León Monterrey, Nuevo León Mexico

UNL University of Nebraska State Museum Lincoln, Nebraska USA

UNLCentro de Estratigrafia e Paleobiologia da Universidade Nova de Lisboa Monte de Caparica Portugal

ZMK Zoological Musem, Kristiania Oslo Norway

ZMK Zoologisches Museum der Universität Kiel Kiel Germany

ZMK Zoological Museum, Copenhagen Copenhagen Denmark

Page 21: David Schindel - Barcode Data Standard Compliance

CBOL/GBIF/NCBI Registry of Biorepositories

www.biorepositories.org