37
JGI Timeline 1997 JGI April 2003 Human Genome Program Officially Ended Human Genome Program Officially Launched 1990 Joint Genome Institute ………………….(JGI) 5 19 16 Non Traditiona l User Facility

JGI Timeline 1997 JGI April 2003 Human Genome Program Officially Ended Human Genome Program Officially Launched 1990 Joint Genome Institute ………………….(JGI)

Embed Size (px)

Citation preview

JGI Timeline

1997

JGI

April 2003

Human Genome Program

Officially Ended

Human Genome Program

Officially Launched

1990

Joint Genome Institute ………………….(JGI)

5

19

16

Non Traditional User Facility

US DOE Joint Genome Institute

The JGI Post Human Genome Project

Community Sequencing Program

(CSP)

Microbial Community Genomics

What types of projects will the JGI/CSP accept?

A wide range of projects. Ultimately, the most important factor in determining if a project will be accepted is its scientific merit.

User Guide > How to Propose a Project

JGI DirectorUsersProposalStudyPanel

ScientificAdvisoryCommittee

SequenceAllocation

Designated LabDirector

Proposals & Peer Review ProcessGeneral Scientific Users Proposals

FAQ

What can researchers get from the CSP program?

The deliverables can range from raw sequence traces to well-annotated assembled genomes depending on the request in the proposal.

Users

Scientific Support for Approved Projects

Scientific Support GroupSSG

ProductionSequencing

InformaticAnalysis

Of Sequence

Interactions of the JGI and Scientific Users with Approved Sequencing Proposals

DOE

Gov Agencies

Scientific Support for Approved Projects

Scientific Support GroupSSG

ProductionSequencing

InformaticAnalysis

Of Sequence

Interactions of the JGI and Scientific Users with Approved Sequencing Proposals

(EPA,USDA, NSF)

GTL, Microbe

CSP

DOE

Informatics

JGI Science Programs

Production Sequencing

DOE+CSP+Gov A

Informatics

JGI Science Programs

Scientific Support Group

Production Sequencing

Sequence Based Science at the JGI

•Gene Regulatory Vocabulary of Animals

•Studies of Body Plan Evolution

•Microbial Community Genomics

• < 1% of microbes are culturable• Many unculturables live in

interdependent consortia of considerable diversity

• Aim: to recover genome-scale sequences and reveal metabolic capabilities

• What is the structure of natural microbial populations? What is a microbial species? Can we harness their metabolic capabilities

What Enviroments to Study?

• Ones with minimal microbial complexity

Iron MountainJill Banfield et al. UC Berkeley

JillBanfield Gene Tyson

Phil Hugenholtz

UC Berkeley Geology

Iron Mountain

Superfund site

Discharging >1 ton of toxic metals/day

(pH <1)

FeS2

“whole metagenome shotgun” dataset

Purify High Molecular Weight DNA

Shotgun LibraryConstruction

DNASequencing

Fosmid LibraryConstruction

Fosmid InsertEnd Sequencing

AssemblyAnnotation

===========

===========

=

=

=

=========== =

===========

Enviromental Sample

Purify High Molecular Weight DNA

DNASequencing

Fosmid LibraryConstruction

Fosmid InsertEnd Sequencing

AssemblyAnnotation

===========

=

=

=========== ==

Shotgun LibraryConstruction

===========

===========

Shotgun LibraryConstruction

===========

===========

=

When possible culture isolates

=?

Enviromental Sample

Iron Mtn “whole metagenome shotgun” GC content separates into two components

Forward read average G+C

Rev

erse

rea

d a

vera

ge

G+

C

archaea

bacteria

Iron Mountain “whole metagenome shotgun”

GC and depth distributions

Re

ad

av

era

ge

G+

C

0.55

0.38

Read depth

3 10

Bacterial

Lepto IILepto III

Archaeal

Fer 2

Fer 1 (cultured and sequenced )

G-plasma

Re

ad

av

era

ge

G+

C

0.55

0.38

3 10

Bacterial

Lepto IILepto III

3 10

Read depth

Archaeal

Fer 2 (3X)

Fer 1 (1X)

G-plasma (1X)

Re

ad

av

era

ge

G+

C

0.55

0.38

3 10

Bacterial

Lepto II (3X)Lepto III (1X)

3 10

Stoichiometry

Read depth

Archaeal

Fer 2

Fer 1

G-plasma

Re

ad

av

era

ge

G+

C

0.55

0.38

3 10

Bacterial

Lepto IILepto III

3 10

Other sampled genomes at low depth (including eukaryotes) 15% of reads

Similarity to Fer1 (isolate) to Sequence in Community

%id to cultivated Fer1 isolate

Nu

mb

er o

f re

a ds

64.9% 78.2%98-100%

.50 .60 .70 .80 1..90

Fer2 G plasma

Fer1

Mixed Community Reads

Conclusions So Far

• The stochiometry of organisms encouraging for the assembly of individual genomes

• Assemblies support 16S studies suggesting limited diversity

• Isolated Fer1 genome sequences matches genome in environmental sample

How do we know that our assembly is correct?

How do you know you’ve done it right?Check pair ends against scaffold

At the gross level: check pairs (expect few % due to failing/chimeric clones)

Align all reads back against assembled scaffolds

scaffolds end where there is no clone coverage in 3kb plasmids

Identifies potentially repetitive areas and/or rearrangements

How do we know that our assembly is correct?

Fer2 vs. fer1 shows local synteny

• Fer1 and

• Fer2 have avg. nt identity of 78%

Fer1 gene on contig

Fe

r2 g

en

e o

n c

on

tig

0

0.05

0.1

0.15

0.2

0.25

0 0.2 0.4 0.6 0.8 1

% amino acid identity

frequ

ency

/5%

What does it mean to assemble a community genome?

Sample derived from millions of genomes. ?

What is a “species” in the enviroment?

Members of the same species

a) significantly different (many lineages survive and diverge)

b) highly similar (selective sweeps)

What does it mean to assemble a community genome?

Lepto II : 1 nucleotide variation / 3,000 bp

Fer II: 2.2 nucleotide variation / 100 bp

1

• CONSENSUS 130953 gtttatattaaatccattgatttctaagcttccggttcttcttccgtataatggagattt 131012• XYG46314.b1 162 A.......C........................A...........A.............. 103• XYG44123.b1 673 A.......C........................A...........A.............. 732• XYG44918.b1 48 A.......C........................A........... 4• XYG13291.g3 2 .......... 11• XYG40116.g1 192 ......G..................................................... 133• XYG3051.b2 396 ......G..................................................... 455

• CONSENSUS 131013 atagcttaataattcatcctccatcatacttatgcttgaacctgataatattatgtatag 131072• XYG46314.b1 102 ............................................................ 43• XYG44123.b1 733 ............................................................ 792• XYG13291.g3 12 ............................................................ 71• XYG40116.g1 132 ...A........................................................ 73• XYG3051.b2 456 ...A........................................................ 515

• CONSENSUS 131073 ccttgtagtatccattaattcatcaaatattttctgcattatagatataataccatggtt 131132• XYG46314.b1 42 .......................................... 1• XYG44123.b1 793 ........................ 816• XYG13291.g3 72 ............................................................ 131• XYG40116.g1 72 T............G....C....................A.................... 13• XYG3051.b2 516 T............G....C....................A.................... 575

5 Reads of the Same Sequence from 5 Different Members of the Same Species (FerII)

1

3

54

21

54

21

3

1

• CONSENSUS 130953 gtttatattaaatccattgatttctaagcttccggttcttcttccgtataatggagattt 131012• XYG46314.b1 162 A.......C........................A...........A.............. 103• XYG44123.b1 673 A.......C........................A...........A.............. 732• XYG44918.b1 48 A.......C........................A........... 4• XYG13291.g3 2 .......... 11• XYG40116.g1 192 ......G..................................................... 133• XYG3051.b2 396 ......G..................................................... 455

• CONSENSUS 131013 atagcttaataattcatcctccatcatacttatgcttgaacctgataatattatgtatag 131072• XYG46314.b1 102 ............................................................ 43• XYG44123.b1 733 ............................................................ 792• XYG13291.g3 12 ............................................................ 71• XYG40116.g1 132 ...A........................................................ 73• XYG3051.b2 456 ...A........................................................ 515

• CONSENSUS 131073 ccttgtagtatccattaattcatcaaatattttctgcattatagatataataccatggtt 131132• XYG46314.b1 42 .......................................... 1• XYG44123.b1 793 ........................ 816• XYG13291.g3 72 ............................................................ 131• XYG40116.g1 72 T............G....C....................A.................... 13• XYG3051.b2 516 T............G....C....................A.................... 575

Two Haplotypes Among the 5 Different Members of the Same Species (FerII)

1

3

54

21

54

21

3

54

211

• CONSENSUS 130953 gtttatattaaatccattgatttctaagcttccggttcttcttccgtataatggagattt 131012• XYG46314.b1 162 A.......C........................A...........A.............. 103• XYG44123.b1 673 A.......C........................A...........A.............. 732• XYG44918.b1 48 A.......C........................A........... 4• XYG13291.g3 2 .......... 11• XYG40116.g1 192 ......G..................................................... 133• XYG3051.b2 396 ......G..................................................... 455

• CONSENSUS 131013 atagcttaataattcatcctccatcatacttatgcttgaacctgataatattatgtatag 131072• XYG46314.b1 102 ............................................................ 43• XYG44123.b1 733 ............................................................ 792• XYG13291.g3 12 ............................................................ 71• XYG40116.g1 132 ...A........................................................ 73• XYG3051.b2 456 ...A........................................................ 515

• CONSENSUS 131073 ccttgtagtatccattaattcatcaaatattttctgcattatagatataataccatggtt 131132• XYG46314.b1 42 .......................................... 1• XYG44123.b1 793 ........................ 816• XYG13291.g3 72 ............................................................ 131• XYG40116.g1 72 T............G....C....................A.................... 13• XYG3051.b2 516 T............G....C....................A.................... 575

Two haplotypes Among the 5 Different Members of the Same Species (Fer II)

1

3

54

21

3

Polymorphisms occur in blocks

• Long quiet regions separate highly variable segments

• Variation is found in blocks of 5-10 genes

Local depth% polymorphic sites

ORFs

Summary of Iron Mountain Biofilm

• Limited number of predominant species present

in biofilm the majority have never been cultured

• Several lines of evidence suggest that we can assemble genomes of these organisms

• Simplicity of community suggests removal of most variants by natural selection

• Now studying the metabolic capabilities of

microbes