Upload
dorian-beckford
View
215
Download
2
Embed Size (px)
Citation preview
Errors in sequence replication
Shown to be an important limitation for early life by Eigen (1971)
Treatment here follows Eigen, McCaskill & Schuster (1988)
Sequence space – the space of all possible sequences in which molecular evolution takes place. e.g. all possible RNA sequences made of A C G U.
Master sequence – a high fitness, fast replicating sequence
e.g. a self-replicating ribozyme in the RNA world
or a virus genome in a modern cell
Replication is not perfect:
error (mutation) rate u per base - fidelity q = 1-u.
For sequence length L, overall fidelity qL (prob of no mistakes)
Can the master sequence survive? Mutation versus selection.
Simplest sequence space is binary – two symbols 1 and 0
‘hypercube’ shows mutational pathways between sequence
Suppose 000000 represents the master sequence
No. 1-mutant neighbours = L
No. 2-mutant nbs. = L(L-1)/2 etc.
For large L, back mutation is unlikely.
Hamming distance measures how far apart sequences are
d(i,k) = number of mutations required to get from seq i to seq k
Replicating sequences with errors – general equations
Ai = replication rate of sequence i
Di = degradation rate
Qii = qL = prob that copy of i is also i (no mistakes)
Qik = prob that replication of k produces i
),(),(
1
1 kidLkid
ik qq
Q
= 4 for RNA, 2 for binary sequences
Balance of accurate reproduction of i and degradation : Wii = AiQii-Di
Rate at which i is produced by inaccurate replication of k : Wik = AkQik
xi = relative conc of i:
mean excess production
i
ix 1
i
iii DAxE )(
iik
kikiiii xExWxWdt
dx
The final term is required to maintain the total concentration fixed. Think of it as dilution in a chemostat or competition for resources.
Easy case: sequence 0 is master sequence (better than the rest) – A0
All the others are the same – A1 (A1< A0)
All sequences have same degradation rate D
DqAW
DAAxADAxDAxEL
000
10011000 )())(1()(
00
00000 xExWxWdt
dx
kkk
forget back mutation when L >> 1
Concentrations are at equilibrium when
i.e. when
00 dt
dx
EW 00
10
100 AA
AqAx
L
Error threshold occurs when x0 0
For master sequence to survive, must have
01 / AAqL
Example with L = 50, A0=10, A1=1
error threshold when
q50 = 0.1 q = 0.955
error rate u = 1-q = 0.045
Master seq only Quasispecies Random sequences
Implications of the error threshold #1
01 / AAqL
10 /)1( AAu L
01 /)1( AAu L
)/ln()1ln( 10 AAuL
if L >> 1 and u << 1, this becomes )/ln( 10 AALu
If L is fixed there is a maximum error rate at which the master sequence survives.
If u is fixed there is a maximum length that can survive
Max is of order 1 mutation per whole genome replication.
Eigen paradox – longer molecules will likely be better replicators, but longer molecules require better accuracy...
How could first replicators overcome this?
Implications of the error threshold #2
Viruses are present as quasispecies – mutations occur within a patient. Difficult to get effective drugs.
RNA viruses have high mutation rates – u = 10-4 - 10-6 per base
DNA viruses 10-7 – 10-8
DNA genomes in cells 10-9 – 10-10
Viruses limited in size by mutation rate
Maybe high rate is advantageous too...
‘Bag of Genes’ model
Simulations by Gocmanac & Higgs. Based on Stochastic Corrector model of Szathmary & Demeter.
Cell requires K (=3) types of gene. Alive if it has at least 1 of each.
Lattice with one cell per site.
Pick a cell. Duplicate one gene in the cell at random. If cell reaches max size M, divide randomly. Put one cell in parent position and one on a neighbouring site (overgrowth/competition).
Simplest Case
• M=14• Inviable cells (white)• Viable cells (black)
Fraction of viable cells increases with M.
Whole population dies if M too small.
Parasites / Selfish genes• A parasite occasionally arises by mutation• Parasites (pink) replicate faster within the cell• Don’t contribute to cell survival• Increase likelihood of creating inviable cells
Parasites created at low rate
Black = no parasites; Brown = At least one parasite
Pink = more than half the genes are parasites
M = 14 M = 40
clusters of infected cells are related
Fraction of viable cells as a function of M
If M is heritable, cells evolve to an optimal division size
Add Horizontal Gene Transfer between neighbours.
Same parameters as M = 40 example before.
Many more infected cells
Many more inviable cells
Range of params in which population survives is reduced (see previous slide)
Infected cells (brown/pink) invade healthy cells (black).Highly infected cells (pink) die leaving empty space (white).Healthy cells grow into empty space.
Significance of horizontal gene transfer
Phylogenetic trees derived from different genes not always the same
Different strains of same bacterial species differ in gene content (e.g E. coli)
HGT may have been more frequent in the past. Maybe there is no tree? (Doolittle)
Alternative view – there is a core of genes that follows the organismal tree and there is some HGT superimposed on this.
If HGT was very frequent in early evolution, there would be no separate species. Communally evolving network of cells (Woese).
HGT aids the spread of new beneficial genes but it also makes parasites much more serious. Which is most important?
General problem in evol. biol. = Evolution of Cooperation
Prisoner’s dilemma
Strategies C (cooperate) or D (defect = cheat)
Stories (Prisoners; Superpowers; Chimpanzees....)
C D
C R S
D T P
You
Me Matrix shows payoff to Me
T (temptation) > R (reward) > P (punishment) > S (sucker)
Game theory: logic says always defect. D is a Nash equilibrium.
Evolutionary game theory: strategies multiply in proportion to their payoffs. Evolves towards 100% defectors. D is an Evolutionarily Stable Strategy (ESS).
Spatial Prisoners Dilemma (Nowak & May, 1992)
Each individual plays against its 8 lattice neighbours and a copy of itself.
Get score for each individual. At next timestep, each site is occupied by a copy of the best-scoring site in its neighbourhood.
Blue – is C, was C; Green – is C, was D;Red – is D, was D; Yellow – is D, was C.
Clusters of Cs survive. Hooray!
Cs have relatives on neighbouring sites.
Cs would die out if randomly mixed.
Q experiments (Spiegelmann)
Q is a bacteriophage – virus RNA length ~ 4000
Codes for a replicase protein.
Works in vitro.
add virus RNA
allow to replicate, then transfer small amount to next tube
analyse sequences in last tube
Sequences evolve towards short lengths (~200).
These are rapidly replicated but cannot function as viruses. Function is lost because it is not necessary.
“Spiegelmann’s monster”.
Cooperation between replicators – Szabo et al. (2002)
ADDBCAAAAAAAACBBCCCABBBBBBBBBBBBBBBBADADDDDAADCCCCCCCDDDDDDADB
nA nB nC
Sequences of four letters: A B C D
A controls template efficiency t(nA)
B controls replicase efficiency r(nB)
C controls replicase fidelity f(nC)
D is dummy nA
t(nA)
Each site on lattice is empty or contains one sequence.
Pick a site at random. The sequence on this site decays with a certain prob, or it is replicated by a neighbour with a certain prob.
replication rate ~ rt/n (t = template efficiency of the sequence being copied; r = replicase eficiency of the neighbour; n = total length of the sequence being copied).
n (total)
nA, nB, nC
nD
Evolution toward longer sequences. nA, nB, nC all selected but not nD
Fidelity for point mutations (like q in Eigen model) is the f value of the neighbour.
Also allow insertions and deletions that change the length.
Sequences cannot copy themselves. Need to cooperate with neighbours (reciprocal altruism).
Length distribution is bimodal.
Long sequences are cooperators. They have good fidelity, replicase and template ability.
Short sequences are defectors – selfish genes –
They mostly have template ability only – like the Q monsters.
The spatial arrangement is important.
Sequences replicate neighbours. There are clusters of cooperators.
If you allow mixing (diffusion on lattice) then defectors win. Total length goes down. Efficient replicators cannot evolve.
Begin with previous case. Then turn on diffusion.
n goes down.
We have seen two ways in which replicators might cooperate and avoid being overrun by selfish genes.
1. Compartments – as in stochastic corrector mechanism. Cells contain groups of moderate numbers of sequences. Random segregation creates variation between cells. Evolution selects those with fewer parasites.
2. Spatial distribution – as in lattice models. Clusters of cooperators can survive even if they would not survive in a freely mixed system.
Autocatalytic sets
f1, f2 ... are food molecules supplied from outside
a, b, c ... are other molecules made by metabolism
White circles are reactions. Dashed arrows indicate catalysis
Reflectively autocatalytic – each molecule can be built from other molecules in the set and the food molecules; each formation reaction is catalyzed by other molecules in the set.
Constructively autocatalytic – the set can be built up sequentially via catalyzed reactions starting from the food set.
definitions from Mossel and Steel (2005) J. Theor. Biol.
Kauffman – Origins of Order (1993)
Food set A and B
Consider all possible polymerization reactions
Suppose all reactions have some chance of being catalyzed by another polymer (red arrows).
If you consider a large enough set, it will always contain an autocatalytic subset.
Argues for metabolism first. The set is autocatalytic, not a single polymer.
No reaction rates, concentrations or thermodynamics