Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Global Genomic Resources for Biodiversity Research
1
Jonathan Coddington Global Genome Initiative Smithsonian Institution
Before After
2
Hard-to-find, ambiguous quality tissues ambiguously owned by individual PI’s
Discoverable, genomic samples in institutional biorepositories (best practices & int. treaties)
“Boutique” sequencing of a few genomes
Affordable, coordinated, sequencing of a thoughtful synopsis of all of Life
Phenotypic, expert-limited taxonomy Dispersed environmental biology, evolution, conservation, ecology, biotech
Approximate “mesoscale” IDs of most organisms anywhere Precise, scalable, cheap tools
Global Genome Initiative
Four Ways to Be a Genomic Sample
3
Fo
Cold Living (good)
Cold Dead (less good)
Warm Dead (ok…)
Warm Living (best)
Living
Dead
Warm Cold
Wildlands (!!), parks (!), zoos, botanical gardens, aquaria
Cell cultures, seed banks, biobanks, biorepositories
Museums, herbaria, other collections
biobanks, biorepositories, research freezers, etc.
3
LIFE ON ICE!
Eukaryotes
Global Genome diversity = Σ (branch lengths)?
Just one Genome!
Discovery of “Families”
5
-
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1750 1775 1800 1825 1850 1875 1900 1925 1950 1975 2000 2025
Per
cent
202
0 To
tal
Earliest Description
Discovery of Major Lineages (Families)
Angiosperms
Chordata
Animalia
Foraminifera
Fungi
Big Data for 9,911 families
6
Source Total Records % Families Missing MaxGBIF 955,459,561 0.93 710 51,162,821 BHL 3,603,706 0.79 2,064 31,017 NCBI 219,978,814 0.77 2,288 17,150,286 OTOL* 1,903,704 0.76 2,407 39,969 BOLD 6,373,906 0.57 4,282 424,231 EOL 99,886 0.38 6,132 2,707 GGBN 1,632,440 0.36 6,334 165,650 Total 1,187,419,577 196
*OTOL = Open Tree of Life
Only 196 families absent from all 7 DBs
High Priority for IBOL2: 4,282 families?
AllPhyla
3
16
GlobalGenomeBiodiversityNetworkTissues/DNA(68)
NCBIGenBank
DNABarcodes
(37)
CatalogueofLife(101)
10
20
Lastupdated03October2018
Results exclude namesfrom GCM that did notmatch to CoL. MismatchrateforGCMwas2%.
GlobalCatalogueofMicroorganismsCultures(57)
33 4
10
5
Animalia: Cycliophora, Dicyemida, Entoprocta, Gastrotricha, Gnathostomulida, Micrognathozoa, Myxozoa, Nematomorpha, Onychophora, Orthonectida, Placozoa, Chromista: Acavomonidia, Picozoa, Radiozoa, Plantae: Anthocerotophyta, Protozoa: Calcitarcha, Choanozoa, Metamonada, Microsporidia
AllFamilies
787
4,942
GlobalGenomeBiodiversityNetworkTissues/DNA(3,457)
NCBIGenBank
DNABarcodes(3,248)
CatalogueofLife(9,858)
186
2,118
Lastupdated03October2018
Results exclude namesfromthatdidnotmatchtoCoL. Mismatch rates:GGBN 7%, GenBank 8%,GCM7%.
GlobalCatalogueofMicroorganismsCultures(1,221)
363157
515
790
AllGenera
11,722
126,141GlobalGenomeBiodiversityNetworkTissues/DNA(17,783)
NCBIGenBank
DNABarcodes(24,375)
CatalogueofLife
(160,938)
344
11,730
Lastupdated03October2018
Results exclude namesfromthatdidnotmatchtoCoL. Mismatch rates:GGBN 6%, GenBank 10%,GCM7%.
GlobalCatalogueofMicroorganismsCultures(6,402)
766579
4,713
4,943
GGBN Solves a Problem data model for tissues, DNAs, etc.
GBIF specimens &
vouchers
NCBI, BOLD sequences
GGBN Tissues, DNAs, RNAs, physical
genetic resources
The Missing
Link
83 members, 30 countries, 2M samples, 3,899 families, 20K genera, 45K species
GGBN DarwinCore Extension (156 fields, 9 mandatory)
http://terms.tdwg.org/wiki/GGBN_Data_Standard
[CELLRANGE]
[CELLRANGE]
[CELLRANGE] [CELLRANGE] [CELLRANGE]
[CELLRANGE] [CELLRANGE]
8
96 313
1,365
9,446
156,056
1,885,450
0
1
2
3
4
5
6
7
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
Kingdoms Phyla Classes Orders Families Genera Species
Nam
es in
the
Cat
alog
ue o
f Life
Taxonomic Rank
In GGBN Not in GGBN
GGBN Progress as of May 2018
Eukaryo,c Genome Quality (n=3311)
-
50
100
150
200
250
300
350
400
450
500
1.2.2
1.4.4
1.6.6
2.2.2
2.3.3
2.3.5
2.3.7
2.4.5
2.4.7
2.5.5
2.5.7
2.6.6
2.6.8
2.7.8
3.4.5
3.4.7
3.5.5
3.5.7
3.5.9
3.6.7
3.7.7
3.7.9
4.6.6
#Gen
omes
GenomeQuality
Animals
Plants
Fungi
ProUsts
zOther
XAxis0.xGenomeLevel1=conJg2=scaffold3=chromosome4=complete0.x.xlog(conJgn50)0.x.x.xlog(scaffoldn50
ASerLewin&Al_2018_EarthBioGenomeProject:Sequencinglifeforthefutureoflife.www.pnas.org/cgi/doi/10.1073/pnas.1720115115
0.7XAgaroseGel
PFGE,1XAgaroseGel(200ng)
*
*
FEMTOPulse(min10fg)
Pipe\eshearing?
Femto Pulse 165 kb ladder
*Solemya velum genomic DNA
Conclusions and Thanks!
• Big Data: OK! • Biodiversity Discovery: OK! • Preserving the genome = Σ branch lengths
Gap Analysis (known unknowns): – Especially useful to set priorities – Quantitative metrics possible – Enviro management, conservation – Build community, bridges to biodiversity genomics
16