View
131
Download
0
Tags:
Embed Size (px)
Citation preview
MOLECULAR EVOLUTION
Afsaneh Taghipour
The neutral theory
•The neutral theory and its predictions for levels of polymorphism
and rates of divergence .
•The nearly neutral theory
1
There is much genetic variation within
almost all species. The amount of
genetic variation is too much to be
maintained by selection
Natural Selection view of Evolution
►Mutations arise by chance(meaning the mutations
are not directed to match environmental needs.)
►Favorable (= higher fitness) mutations increase in frequency via selection (changed fitness often associated with changed environment).
►Deleterious (= lower fitness) mutations arereduced in frequency (many resistance mutationsare deleterious unless toxic agent present.)
New Idea
►The neutral theory of evolution developedby Motoo Kimura
►The neutral theory departed from all existing models by using N, the population size, as the most important population parameter.
►What is the neutral theory of evolution?
The Neutral Theory ►1 )There are no fitness differences
betweenalmost all of the molecular variation that isdetected in populations.►Neutral is the word chosen to describe the lack offitness differences ( functionally equivalent alleles ).
►2 )Amount of genetic variation in a population isdetermined by a balance between an increase dueto mutation, rate = μ, and a decrease due to finitepopulation size (=genetic drift).
The neutral theory now forms the basis of the most widely employed null model in molecular evolution .
The neutral theory adopts the perspective that most mutations have little or no fitness advantage or disadvantage and are therefore selectively neutral .
Genetic drift is therefore the primary evolutionary process that dictates the fate (fixation or loss) of newly occurring mutations.
In the 1950s and 1960s, it was widely thought that most mutations would have substantial fitness differences and therefore the fate of most mutations was dictated by natural selection.
Motoo Kimura argued instead that the interplay of mutation and genetic drift could explain many of the patterns of genetic variation and the evolution of protein and DNA sequences seen in biological populations.
The neutral theory null model makes two major predictions under the assumption that genetic drift alone determines thefate of new mutations. One prediction is the amount of polymorphism for sequences sampled within a population of one species .
The other prediction is the degree and rate of divergence among sequences sampled from separate species.
Divergence Fixed genetic differences thataccumulate between two completely isolatedlineages that were originally identical whenthey diverged from a common ancestor.
Polymorphism The existence in a populationof two or more alleles at one locus.Populations with genetic polymorphismshave heterozygosity, gene diversity, ornucleotide diversity measures that aregreater than zero
POLYMORPHISM
The balance of genetic drift and mutation determines
.polymorphism in the neutral theory
More alleles segregating in the populationindicate more polymorphism. Segregating alleles,and therefore polymorphism, result from the randomwalk in frequency that each mutation takes undergenetic drift.
The neutral theory then predicts that the rate of fixation is μ and thereforethe expected time between fixations is 1/μ generations. For that subset of mutations that eventually fix, the expected time fromintroduction to fixation is 4Ne generations
chance of eventual fixation is
chance of eventual loss is
average time to fixation of a new mutation
approaches 4N generations
average time to loss approaches just
combined processes of mutation and genetic driftproduce equilibrium heterozygosity :
DIRECTIONAL SELECTION FIX ADVANTAGEOUS MUTATION
With neutral mutations most mutations go to loss fairly rapidly and a few mutations eventually go to fixation
balancing selection greatly increases the segregation time of alleles and increases polymorphism compared to neutrality
DIVERGENCE
The neutral theory also predicts the rate of divergence between sequences. Genetic divergence occurs by substitutions that accumulate in two DNA sequences over time.
As substitutions accumulate, the two sequences
diverge from the ancestral sequence as well as from each other. In this example, the two sequences are eventually divergent at
five of 16 nucleotide sites due to substitutions.
Substitution
The complete replacement of one allele previously most frequent in the population with another allele that originally arose by mutation.
The neutral theory predicts the rate at which allelicsubstitutions occur and thereby the rate at whichdivergence occurs. Predicting the substitution ratefor neutral alleles requires knowing the probabilitythat an allele becomes fixed in a population and thenumber of new mutations that occur each generation.
Initial frequency of a new mutation is
Under genetic drift, the chance of fixation of any neutral allele is simply its initial frequency
chance that an allele copy mutates is μ
the expected number of new mutations in a population each
generation is 2Nμ.
the rate at which alleles that originally
entered the population as mutations go
to fixation per generation
Notice that this equation simplifies to k = μ
Since the rate of neutral substitution is μ, the
expected time between neutral substitutions is 1/μ generations
Using a clock that chimes on the hour as an example,the rate of chiming is 24 per day (or 24/day) .
NEARLY NEUTRAL THEORY
The nearly neutral theory considers the fate ofnew mutations if some portion of new mutations are acted on by natural selection of .different strengthsThe nearly neutral theory recognizes three categories of new mutations :
Neutral mutations, mutations acted on strongly by either positive or negative natural selection, and mutations acted on weakly by natural selection relative to the strength of genetic drift. This last category contains mutations that are nearly neutral since neither natural selection nor genetic drift will .determine their fate exclusively
For a new mutation in a finite population that experiences natural selection, the forces of directional selection and genetic drift oppose each other.
Genetic drift causes heterozygosity to
decrease .at a rate of 1/2Ne per generation
The selection coefficient (s) on a genotypedescribes the “push” on alleles toward fixation or loss due to natural selection.
chance of fixation is approximately 2s
Setting these forces equal toeach other,gives4Nes = 1 as the condition where the processes ofgenetic drift and natural selection are equal. When4Nes is much greater than one natural selection isthe stronger process whereas when 4Nes is much lessthan one genetic drift is the stronger process.Using more sophisticated mathematical techniquesProbability of fixation for a New mutation in a finite
population is
Under the nearly neutral theory the probability of fixation
depends on the balance between natural selection and
genetic drift, expressed in the product of the effective
population size and the selection coefficient (Nes).
Measures of divergence andpolymorphism
•Measuring divergence of DNA sequences.
•Nucleotide substitution models correct divergence estimates for saturation.
•DNA polymorphism measured by number ofsegregating sites and nucleotide diversity.
2
The smallest possible unit of the genome is a homologous nucleotide site, or single base-pair position in the exact same genome location, that could be compared among individuals.
Genetic variation at such nucleotide sites is characterized by the existence of DNA sequences that have different nucleotides and is called nucleotide polymorphism.
p Distance The number of nucleotide sites
that differ between two DNA sequencesdivided by the total number of nucleotidesites, a shorthand for proportion-distance.Sometimes symbolized as d for distance.
SATURATION
Saturation is the phenomenon where DNA sequence divergence appears to slow and eventually reaches a plateau even as time since divergence continues to increase. Saturation in nucleotide changes over timeis caused by substitution occurring multiple times at the same nucleotide site, a phenomenon calledmultiple hit substitution
DNA SEQUENCE DIVERGENCE AND SATURATION
There are a wide variety methods to correct theperceived divergence between two DNA sequencesto obtain a better estimate of the true divergenceafter accounting for multiple hits. These correctionmethods are called nucleotide substitutionmodels and use parameters for DNA base frequencies and substitution rates to obtain a modified estimate of the divergence between two DNA sequences. The simplest of these is the Jukes and Cantor (1969) nucleotide-substitution model, named for its authors.
The three types of event that a single nucleotide site may experience over two generations.
probability of a nucleotide substitution is customarily represented by
probability of any substitution is
probability that a nucleotide stay the same(e.g G) one
generation later is
probability of no substitutions over two generations is
The probability that a nucleotide site retains its original base pair under the Jukes–Cantor model of nucleotidesubstitution.
probability of a multiple hit nucleotide substitution whichrestores the initial nucleotide
Probability that a nucleotide site hasthe same bp after two generations:
The change in the probability that a given nucleotideis found at a site over one generation is then
which then simplifies to
If we consider the rate of change at any time t
the term approaches zero so that
PG(t) approaches ¼
If the nucleotide at a site is initially a G ,then PG(t) = 1 and the probability the site remains a G over time is
is not initially a G then PG(t) = 0. The probabilitythe site remains a G over time is
two DNA sequences originally identical by descent at every nucleotide site at time 0, at some later time t the probability that any site will possess the same nucleotide is
The exponential term is now because there are two DNA sequences
The probability that two sites are different or divergent
–call it d – over time is one minusthe probability that sites are identical
natural logarithm of the right side
For two DNA sequences that were originallyidentical by descent, we expect that each site
has a 3αt chance of substitution since there are two sequences, there is a 6αt chance of a site being divergent between the two sequences
If we set expected divergence K = 6αt, then we notice K is close to the 8αt above. In fact, K is 3/4 of the expression for 8αt
Imagine two DNA sequences that differ at 1 site in 10 so the p distance is 10% or d = 0.10. This level of observeddivergence is an under-estimate because it does notaccount for multiple hits. To adjust for multiple hitswe compute corrected divergence as
which shows that at the low apparent divergenceof 10% there are expected to be 0.7% of sites thathad experienced multiple hits.
DNA polymorphism
Variable DNA sequences at one locus within a species represent different alleles that are present in the population.construct a multiple sequence alignment sothat the homologous nucleotide sites for each sequence are all lined up in the same columns
One measure of DNA polymorphism is the numberof segregating sites, S. A segregating site isany of the L nucleotide sites that maintains two ormore nucleotides within the population
by dividing the number of segregating sites by the total number of sites:
The number of segregating sites (S) under neutralityis a function of the scaled mutation rate 4Neμ.Watterson (1975) first developed a way to estimateθ from the number of segregating sites observed ina sample of DNA sequences. The expected number ofsegregating sites at drift–mutation equilibrium canmore easily be determined using the logic of the coalescent model
Under the infinite sites model of mutation, each mutation that occurs increases the number of segregating sites by one. The expected number of segregating sites is therefore just the expected number of mutations for a given genealogy
the expected number of mutations in one generation is kμ
If the expected time to coalescencefor k lineages is Tk, then kμTk mutations areexpected for each value of k.
The expected number of mutations is obtained by summing over all k between the present
(and the most recent common ancestor (MRCA
the probability of k lineages coalescing is
the expected time to coalescence is the inverse
expected number of segregating sites in a sample of n DNA sequences
Notice that θ = 4Neμ can be substituted in equation to give
and then rearranging
An estimate of the scaled mutation ratedetermined from the number of segregating sites in a sample of DNA sequences is symbolized as (Wfor
Watterson) or (S for segregating sites .)If
we define a new variable,
Then
using the absolute number of segregating sites
A second measure of DNA polymorphism is thenucleotide diversity in a sample of DNA sequences,symbolized by π (pronounced “pie”), and also known as the average pairwise differences in a sample of DNA sequences
The nucleotide diversity is the sum of the number of nucleotide differences seen for each pair of DNA sequences
where i and j are indices that refer to individualDNA sequences, dij is the number of nucleotide sitesthat differ between sequences i and j, and n is thetotal number of DNA sequences in the sample
In larger samples that may include multiple identicalDNA sequences, the nucleotide diversity can beestimated by
where pi and pj are the frequencies of alleles i and j,respectively, in a sample of k different sequences thateach represent one allele.
Estimates of nucleotide diversity are useful because π is a measure of heterozygosity for DNA sequences
Believe
You
can