Upload
elle-allison
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
JGI Timeline
1997
JGI
April 2003
Human Genome Program
Officially Ended
Human Genome Program
Officially Launched
1990
Joint Genome Institute ………………….(JGI)
5
19
16
Non Traditional User Facility
US DOE Joint Genome Institute
The JGI Post Human Genome Project
Community Sequencing Program
(CSP)
Microbial Community Genomics
Overview
The Community Sequencing Program (CSP)
To provide the scientific community through a peer reviewed process access to high throughput sequencing at the JGI.
What types of projects will the JGI/CSP accept?
A wide range of projects. Ultimately, the most important factor in determining if a project will be accepted is its scientific merit.
User Guide > How to Propose a Project
JGI DirectorUsersProposalStudyPanel
ScientificAdvisoryCommittee
SequenceAllocation
Designated LabDirector
Proposals & Peer Review ProcessGeneral Scientific Users Proposals
FAQ
What can researchers get from the CSP program?
The deliverables can range from raw sequence traces to well-annotated assembled genomes depending on the request in the proposal.
Users
Scientific Support for Approved Projects
Scientific Support GroupSSG
ProductionSequencing
InformaticAnalysis
Of Sequence
Interactions of the JGI and Scientific Users with Approved Sequencing Proposals
DOE
Gov Agencies
Scientific Support for Approved Projects
Scientific Support GroupSSG
ProductionSequencing
InformaticAnalysis
Of Sequence
Interactions of the JGI and Scientific Users with Approved Sequencing Proposals
(EPA,USDA, NSF)
GTL, Microbe
CSP
Sequence Based Science at the JGI
•Gene Regulatory Vocabulary of Animals
•Studies of Body Plan Evolution
•Microbial Community Genomics
• < 1% of microbes are culturable• Many unculturables live in
interdependent consortia of considerable diversity
• Aim: to recover genome-scale sequences and reveal metabolic capabilities
• What is the structure of natural microbial populations? What is a microbial species? Can we harness their metabolic capabilities
Iron MountainJill Banfield et al. UC Berkeley
JillBanfield Gene Tyson
Phil Hugenholtz
UC Berkeley Geology
Purify High Molecular Weight DNA
Shotgun LibraryConstruction
DNASequencing
Fosmid LibraryConstruction
Fosmid InsertEnd Sequencing
AssemblyAnnotation
===========
===========
=
=
=
=========== =
===========
Enviromental Sample
Purify High Molecular Weight DNA
DNASequencing
Fosmid LibraryConstruction
Fosmid InsertEnd Sequencing
AssemblyAnnotation
===========
=
=
=========== ==
Shotgun LibraryConstruction
===========
===========
Shotgun LibraryConstruction
===========
===========
=
When possible culture isolates
=?
Enviromental Sample
Iron Mtn “whole metagenome shotgun” GC content separates into two components
Forward read average G+C
Rev
erse
rea
d a
vera
ge
G+
C
archaea
bacteria
Iron Mountain “whole metagenome shotgun”
GC and depth distributions
Re
ad
av
era
ge
G+
C
0.55
0.38
Read depth
3 10
Bacterial
Lepto IILepto III
Archaeal
Fer 2
Fer 1 (cultured and sequenced )
G-plasma
Re
ad
av
era
ge
G+
C
0.55
0.38
3 10
Bacterial
Lepto IILepto III
3 10
Read depth
Archaeal
Fer 2 (3X)
Fer 1 (1X)
G-plasma (1X)
Re
ad
av
era
ge
G+
C
0.55
0.38
3 10
Bacterial
Lepto II (3X)Lepto III (1X)
3 10
Stoichiometry
Read depth
Archaeal
Fer 2
Fer 1
G-plasma
Re
ad
av
era
ge
G+
C
0.55
0.38
3 10
Bacterial
Lepto IILepto III
3 10
Other sampled genomes at low depth (including eukaryotes) 15% of reads
Similarity to Fer1 (isolate) to Sequence in Community
%id to cultivated Fer1 isolate
Nu
mb
er o
f re
a ds
64.9% 78.2%98-100%
.50 .60 .70 .80 1..90
Fer2 G plasma
Fer1
Mixed Community Reads
Conclusions So Far
• The stochiometry of organisms encouraging for the assembly of individual genomes
• Assemblies support 16S studies suggesting limited diversity
• Isolated Fer1 genome sequences matches genome in environmental sample
How do you know you’ve done it right?Check pair ends against scaffold
At the gross level: check pairs (expect few % due to failing/chimeric clones)
Align all reads back against assembled scaffolds
scaffolds end where there is no clone coverage in 3kb plasmids
Identifies potentially repetitive areas and/or rearrangements
How do we know that our assembly is correct?
Fer2 vs. fer1 shows local synteny
• Fer1 and
• Fer2 have avg. nt identity of 78%
Fer1 gene on contig
Fe
r2 g
en
e o
n c
on
tig
0
0.05
0.1
0.15
0.2
0.25
0 0.2 0.4 0.6 0.8 1
% amino acid identity
frequ
ency
/5%
What does it mean to assemble a community genome?
Sample derived from millions of genomes. ?
What is a “species” in the enviroment?
Members of the same species
a) significantly different (many lineages survive and diverge)
b) highly similar (selective sweeps)
What does it mean to assemble a community genome?
Lepto II : 1 nucleotide variation / 3,000 bp
Fer II: 2.2 nucleotide variation / 100 bp
1
• CONSENSUS 130953 gtttatattaaatccattgatttctaagcttccggttcttcttccgtataatggagattt 131012• XYG46314.b1 162 A.......C........................A...........A.............. 103• XYG44123.b1 673 A.......C........................A...........A.............. 732• XYG44918.b1 48 A.......C........................A........... 4• XYG13291.g3 2 .......... 11• XYG40116.g1 192 ......G..................................................... 133• XYG3051.b2 396 ......G..................................................... 455
• CONSENSUS 131013 atagcttaataattcatcctccatcatacttatgcttgaacctgataatattatgtatag 131072• XYG46314.b1 102 ............................................................ 43• XYG44123.b1 733 ............................................................ 792• XYG13291.g3 12 ............................................................ 71• XYG40116.g1 132 ...A........................................................ 73• XYG3051.b2 456 ...A........................................................ 515
• CONSENSUS 131073 ccttgtagtatccattaattcatcaaatattttctgcattatagatataataccatggtt 131132• XYG46314.b1 42 .......................................... 1• XYG44123.b1 793 ........................ 816• XYG13291.g3 72 ............................................................ 131• XYG40116.g1 72 T............G....C....................A.................... 13• XYG3051.b2 516 T............G....C....................A.................... 575
5 Reads of the Same Sequence from 5 Different Members of the Same Species (FerII)
1
3
54
21
54
21
3
1
• CONSENSUS 130953 gtttatattaaatccattgatttctaagcttccggttcttcttccgtataatggagattt 131012• XYG46314.b1 162 A.......C........................A...........A.............. 103• XYG44123.b1 673 A.......C........................A...........A.............. 732• XYG44918.b1 48 A.......C........................A........... 4• XYG13291.g3 2 .......... 11• XYG40116.g1 192 ......G..................................................... 133• XYG3051.b2 396 ......G..................................................... 455
• CONSENSUS 131013 atagcttaataattcatcctccatcatacttatgcttgaacctgataatattatgtatag 131072• XYG46314.b1 102 ............................................................ 43• XYG44123.b1 733 ............................................................ 792• XYG13291.g3 12 ............................................................ 71• XYG40116.g1 132 ...A........................................................ 73• XYG3051.b2 456 ...A........................................................ 515
• CONSENSUS 131073 ccttgtagtatccattaattcatcaaatattttctgcattatagatataataccatggtt 131132• XYG46314.b1 42 .......................................... 1• XYG44123.b1 793 ........................ 816• XYG13291.g3 72 ............................................................ 131• XYG40116.g1 72 T............G....C....................A.................... 13• XYG3051.b2 516 T............G....C....................A.................... 575
Two Haplotypes Among the 5 Different Members of the Same Species (FerII)
1
3
54
21
54
21
3
54
211
• CONSENSUS 130953 gtttatattaaatccattgatttctaagcttccggttcttcttccgtataatggagattt 131012• XYG46314.b1 162 A.......C........................A...........A.............. 103• XYG44123.b1 673 A.......C........................A...........A.............. 732• XYG44918.b1 48 A.......C........................A........... 4• XYG13291.g3 2 .......... 11• XYG40116.g1 192 ......G..................................................... 133• XYG3051.b2 396 ......G..................................................... 455
• CONSENSUS 131013 atagcttaataattcatcctccatcatacttatgcttgaacctgataatattatgtatag 131072• XYG46314.b1 102 ............................................................ 43• XYG44123.b1 733 ............................................................ 792• XYG13291.g3 12 ............................................................ 71• XYG40116.g1 132 ...A........................................................ 73• XYG3051.b2 456 ...A........................................................ 515
• CONSENSUS 131073 ccttgtagtatccattaattcatcaaatattttctgcattatagatataataccatggtt 131132• XYG46314.b1 42 .......................................... 1• XYG44123.b1 793 ........................ 816• XYG13291.g3 72 ............................................................ 131• XYG40116.g1 72 T............G....C....................A.................... 13• XYG3051.b2 516 T............G....C....................A.................... 575
Two haplotypes Among the 5 Different Members of the Same Species (Fer II)
1
3
54
21
3
Polymorphisms occur in blocks
• Long quiet regions separate highly variable segments
• Variation is found in blocks of 5-10 genes
Local depth% polymorphic sites
ORFs
Summary of Iron Mountain Biofilm
• Limited number of predominant species present
in biofilm the majority have never been cultured
• Several lines of evidence suggest that we can assemble genomes of these organisms
• Simplicity of community suggests removal of most variants by natural selection
• Now studying the metabolic capabilities of
microbes