Upload
nickinheaven
View
224
Download
1
Embed Size (px)
Citation preview
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
1/11
Genome Dynamics and
Environmental Adaptation in
Bacteria
Eric AlmDepts. Of Biological Eng. And Civil and
Environmental Eng., MITBroad Institute of MIT and Harvard
The Modern View of Bacterial
Genome Dynamics
Horizontal Gene Transfer is rampant
Closely related strains harbor lots of newly
acquired DNA HGT is a key mechanism for niche adaptation
native genes are insulated from dynamics at
the periphery of networks
Uptake of Foreign DNA
Transformation
Phage
Conjugation
Genomic islands as reservoirs of new
DNA
Colemann et al., Science 2006
How Common Is It?
Marine isolates of co-existing microdiversity
Large variation in genome
size among closely relatedstrains
Thompson et al., Science 2005
From: Lerat et al. (2005) PLoS Biol 3(5): e130
Genome Dynamics (HGT) at the
Periphery
Pal, Papp & Lercher,Nature Genetics, 2005.*horizontal gene transfer into the E. colilineage since itssplit from the Vibrio lineage.
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
2/11
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
3/11
(gene family) - Principle # 2 & 3: More important proteins evolve
slowly. (e.g. Ribosome)
(genome) - Principle #1: Molecular clock. Rate of changedepends on mutation rate, population size, etc.
(gene,genome) - Principle #5: Positive or negative selection?
rt = (gene family) (genome) (gene,genome) t
Evolutionary Distance =
?
Overview of the Method
Seed possible orthologs:
Single copy ubiquitous COGs
Align and build trees
Compare to species phylogeny
KH-test
Reject outliers
Normalize against family
rate
and molecular clock
Read out terminal
branch lengths
~1000 gene families
744 gene families
rt = (gene family) (genome) (gene,genome) t
Evolutionary Distance =
=
Clock Explain Most Distance
Variation
Predicted branch length log2(t)
Observedbranchlengthlog2
(r
t)
-10 -5 0 5
-10
-5
0
5
Residual variation is an estimate of
What Can We Learn From
Residual Variation?
Noise? Environment-specific selective
pressures
Positive selection
Negative selection
Relaxed negative selection
Similar patterns in similar genes?
FAST
Lost ?
Fisher's exact test:Odds Ratio = 3.1,P = 2.4e-7
Odds Ratio = 0.55,P = 0.01
Fast: > 4.0
Slow: < 0.25
Outgroup
Negative Selection
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
4/11
Selective Sweeps
P=0.05
Positive Selection?
Hypergeometric test for enrichment of COG functions in
fast/slow (top 10% of genes)
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
5/11
Type II secretory pathway,prepilin signal
peptidase PulO and related peptidases
*PulO1989
Tfp pilus assembly protein, major pilin*PilA4969
Tfp pilus assembly protein, pilus retraction
ATPase
**PilT2805
Flagellar hook basal body protein*FliE1677
Flagellar basal body P-ring protein*FlgI1706
Flagellar basal body rod protein*FlgG4786
Flagellin-specific chaperone**FliS1516
Flagellar capping protein*FliD1345
Flagellar basal body protein*FlgB1815
Tfp pilus assembly protein**PilV4967
Flagellar biosynthesis protein*FliO3190
Flagellar basal body P-ring biosynthesis
protein
**FlgA1261
Flagellar basal body rod protein**FlgF4787
Flagellar biosynthesis/type III secretory
pathway chaperone
***FlgN3418
Flagellar biosynthesis pathway**FliR1684
Flagellar biosynthesis pathway**FlhB1377
YersiniaPhotor.E. coliNameCOG
Analysis of Patterns of Selection
Genomes/Experiments
Gene
s
Do correlations in between rows (genes) indicate
similar functional roles?
Selection Acts Coherently Across
Pathways/FunctionsAnalysis of Patterns of Selection
Genomes/Experiments
Gene
s
Do correlations in between columns (genomes)
indicate similar ecology?
Evolution of Evolutionary Rates
Correlation of across all
genes (orthologs) for eachpair of genomes
Deep-branching clades show
significant correlation in
genome-wide selectivepatterns
No Correlation With Phylogeny
Over Shorter Timespans
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
6/11
EnvironmentEnvironment
Novel genes
retained in genome
gene evolutionary rate
variation
gene evolutionary rate
variation
Responses to Natural Selection
Native genes
HGT
A critique of the adaptionist programmeGould & Lewontin, 1979
Front legs a puzzle: how
Tyrannosaurus used its tiny front
legs is a scientific puzzle; theywere too short even to reach the
mouth. They may have been
used to help the animal rise from
a lying position.
- Explanatory information,
Museum of Science, Boston
c. 1979
EnvironmentEnvironment
Novel genes
retained in genome
gene evolutionary rate
variation
gene evolutionary rate
variation
Direct vs. Indirect Selection
HGT Direct selectionDirect
selection
Indirect selection
QuickTime andaTIFF(Uncompressed)decompressor
areneededtosee this picture.
Gene Content Influences Selection on
Genes?
0.020.11(v X time | g-c)
ns0.09(v X dist | g-c)
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
7/11
ac groun : e own ass
Algorithm in Phylogenetic
Inference
What information ispassed from leaves to
parents?
sequence and score
Reconciliation proceeds by labeling each node ingene tree as HGT, Dup, or Speciation (loss isimplied)
Pass LCA (and score) of each subtree fromleaves to root
A Downpass Algorithm for Reconciliation
5
42
4414
5
42
441 4
Species Gene
1 2 3 4 1 2 41 4
The Algorithm
Calculate optimal scenario resulting in each possible LCA
1 2 3 4
1 1 2 4 4
0 3 33
3
3
1
Species tree
Gene tree
Downpass species tree
The Algorithm
Calculate optimal scenario resulting in each possible LCA
1 2 3 4
1 1 2 4 4
Species tree
Gene tree
The Algorithm
Calculate optimal scenario resulting in each possible LCA
1 2 3 4
1 1 2 4 4
Species tree
Gene tree
For all LCAs at parent:
For allLCAs at left child:
For all LCAs at right child:
O(ngns3)
Real Data
COG100: 30S ribosomal subunit proteinS11
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
8/11
Species Species Gene
32 transfers!!
Even with a good species phylogeny, gene trees may havesignificant uncertainty
Bootstrap trees are a convenient but very limited sample ofdifferent topologies
Consensus trees discard information
Uncertainty in Gene Trees Love the Bootstrap
Dont fear the bootstrap -embrace it!
ReconcileALL bootstraps: For each subtree
reconciliation, check other
bootstraps for more
efficient reconciliation
The Idea The Idea
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
9/11
The Idea
Reconciliation as a tool fortree construction
Incorporation of bootstrapsubtrees explores a very
large region of plausibletree space
Constructed tree is mostparsimonious, plausiblegene tree
Reconciliation meets construction
Each internal node of eachbootstrap has three
potential parents
For each node, threetables of potential LCAs
must be maintained
The Algorithm The Algorithm
Bootstrap trees1. Reconcile children
The Algorithm
Bootstrap trees1. Reconcile children
2. Reconcile same node
in bootstrap trees
The Algorithm
Bootstrap trees1. Reconcile children
2. Reconcile same node
in bootstrap trees 3. Return best answer
and merge tables
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
10/11
The Algorithm
Bootstrap trees1. Reconcile children
2. Reconcile same node
in bootstrap trees 3. Return best answer
and merge tables
4. Return table to parent
link subtrees across bootstraps Find path through all bootstrap trees optimizing
reconciliation
After all subtrees reconciled, select bestreconciliation to represent linked subtrees.
It Gets Messy
different entries in the same table can have different
subtree topologies!
5
42
4414
Iterate through all branches Root at branch with best reconciliation
Rooting trees is easy! Real Data Revisited!
COG100: 30S ribosomal subunit protein
S11
Species
Reconciliation
7 transfers
Reconciliation events
8/3/2019 Lecture24 Bacterialenviro Algenomics Ericalm
11/11
Possible to reconcile gene andspecies trees efficiently
Uncertainty in gene trees can hamperreconciliation
Use bootstraps to sample reasonablesubsets of tree space
Are there 7 transfers for COG100? Wrong species phylogeny Need more bootstraps Gold-standard?
Next steps? All metabolic genes Co-evolution among genes with similar
function?
SummaryAcknowledgements
Jesse Shapiro (Evolutionary rates)
Lawrence David (Reconciliation)
Sonia Timberlake (Evolution of regulation)
Sean Clarke (HGT in the laboratory)
Arne materna (Experimental evolution)