Presentation1population neutral theory

MOLECULAR EVOLUTION

Afsaneh Taghipour

The neutral theory

•The neutral theory and its predictions for levels of polymorphism

and rates of divergence .

•The nearly neutral theory

1

There is much genetic variation within

almost all species. The amount of

genetic variation is too much to be

maintained by selection

Natural Selection view of Evolution

►Mutations arise by chance(meaning the mutations

are not directed to match environmental needs.)

►Favorable (= higher fitness) mutations increase in frequency via selection (changed fitness often associated with changed environment).

►Deleterious (= lower fitness) mutations arereduced in frequency (many resistance mutationsare deleterious unless toxic agent present.)

New Idea

►The neutral theory of evolution developedby Motoo Kimura

►The neutral theory departed from all existing models by using N, the population size, as the most important population parameter.

►What is the neutral theory of evolution?

The Neutral Theory ►1 )There are no fitness differences

betweenalmost all of the molecular variation that isdetected in populations.►Neutral is the word chosen to describe the lack offitness differences ( functionally equivalent alleles ).

►2 )Amount of genetic variation in a population isdetermined by a balance between an increase dueto mutation, rate = μ, and a decrease due to finitepopulation size (=genetic drift).

The neutral theory now forms the basis of the most widely employed null model in molecular evolution .

The neutral theory adopts the perspective that most mutations have little or no fitness advantage or disadvantage and are therefore selectively neutral .

Genetic drift is therefore the primary evolutionary process that dictates the fate (fixation or loss) of newly occurring mutations.

In the 1950s and 1960s, it was widely thought that most mutations would have substantial fitness differences and therefore the fate of most mutations was dictated by natural selection.

Motoo Kimura argued instead that the interplay of mutation and genetic drift could explain many of the patterns of genetic variation and the evolution of protein and DNA sequences seen in biological populations.

The neutral theory null model makes two major predictions under the assumption that genetic drift alone determines thefate of new mutations. One prediction is the amount of polymorphism for sequences sampled within a population of one species .

The other prediction is the degree and rate of divergence among sequences sampled from separate species.

Divergence Fixed genetic differences thataccumulate between two completely isolatedlineages that were originally identical whenthey diverged from a common ancestor.

Polymorphism The existence in a populationof two or more alleles at one locus.Populations with genetic polymorphismshave heterozygosity, gene diversity, ornucleotide diversity measures that aregreater than zero

POLYMORPHISM

The balance of genetic drift and mutation determines

.polymorphism in the neutral theory

More alleles segregating in the populationindicate more polymorphism. Segregating alleles,and therefore polymorphism, result from the randomwalk in frequency that each mutation takes undergenetic drift.

The neutral theory then predicts that the rate of fixation is μ and thereforethe expected time between fixations is 1/μ generations. For that subset of mutations that eventually fix, the expected time fromintroduction to fixation is 4Ne generations

chance of eventual fixation is

chance of eventual loss is

average time to fixation of a new mutation

approaches 4N generations

average time to loss approaches just

combined processes of mutation and genetic driftproduce equilibrium heterozygosity :

DIRECTIONAL SELECTION FIX ADVANTAGEOUS MUTATION

With neutral mutations most mutations go to loss fairly rapidly and a few mutations eventually go to fixation

balancing selection greatly increases the segregation time of alleles and increases polymorphism compared to neutrality

DIVERGENCE

The neutral theory also predicts the rate of divergence between sequences. Genetic divergence occurs by substitutions that accumulate in two DNA sequences over time.

As substitutions accumulate, the two sequences

diverge from the ancestral sequence as well as from each other. In this example, the two sequences are eventually divergent at

five of 16 nucleotide sites due to substitutions.

Substitution

The complete replacement of one allele previously most frequent in the population with another allele that originally arose by mutation.

The neutral theory predicts the rate at which allelicsubstitutions occur and thereby the rate at whichdivergence occurs. Predicting the substitution ratefor neutral alleles requires knowing the probabilitythat an allele becomes fixed in a population and thenumber of new mutations that occur each generation.

Initial frequency of a new mutation is

Under genetic drift, the chance of fixation of any neutral allele is simply its initial frequency

chance that an allele copy mutates is μ

the expected number of new mutations in a population each

generation is 2Nμ.

the rate at which alleles that originally

entered the population as mutations go

to fixation per generation

Notice that this equation simplifies to k = μ

Since the rate of neutral substitution is μ, the

expected time between neutral substitutions is 1/μ generations

Using a clock that chimes on the hour as an example,the rate of chiming is 24 per day (or 24/day) .

NEARLY NEUTRAL THEORY

The nearly neutral theory considers the fate ofnew mutations if some portion of new mutations are acted on by natural selection of .different strengthsThe nearly neutral theory recognizes three categories of new mutations :

Neutral mutations, mutations acted on strongly by either positive or negative natural selection, and mutations acted on weakly by natural selection relative to the strength of genetic drift. This last category contains mutations that are nearly neutral since neither natural selection nor genetic drift will .determine their fate exclusively

For a new mutation in a finite population that experiences natural selection, the forces of directional selection and genetic drift oppose each other.

Genetic drift causes heterozygosity to

decrease .at a rate of 1/2Ne per generation

The selection coefficient (s) on a genotypedescribes the “push” on alleles toward fixation or loss due to natural selection.

chance of fixation is approximately 2s

Setting these forces equal toeach other,gives4Nes = 1 as the condition where the processes ofgenetic drift and natural selection are equal. When4Nes is much greater than one natural selection isthe stronger process whereas when 4Nes is much lessthan one genetic drift is the stronger process.Using more sophisticated mathematical techniquesProbability of fixation for a New mutation in a finite

population is

Under the nearly neutral theory the probability of fixation

depends on the balance between natural selection and

genetic drift, expressed in the product of the effective

population size and the selection coefficient (Nes).

Measures of divergence andpolymorphism

•Measuring divergence of DNA sequences.

•Nucleotide substitution models correct divergence estimates for saturation.

•DNA polymorphism measured by number ofsegregating sites and nucleotide diversity.

2

The smallest possible unit of the genome is a homologous nucleotide site, or single base-pair position in the exact same genome location, that could be compared among individuals.

Genetic variation at such nucleotide sites is characterized by the existence of DNA sequences that have different nucleotides and is called nucleotide polymorphism.

p Distance The number of nucleotide sites

that differ between two DNA sequencesdivided by the total number of nucleotidesites, a shorthand for proportion-distance.Sometimes symbolized as d for distance.

SATURATION

Saturation is the phenomenon where DNA sequence divergence appears to slow and eventually reaches a plateau even as time since divergence continues to increase. Saturation in nucleotide changes over timeis caused by substitution occurring multiple times at the same nucleotide site, a phenomenon calledmultiple hit substitution

DNA SEQUENCE DIVERGENCE AND SATURATION

There are a wide variety methods to correct theperceived divergence between two DNA sequencesto obtain a better estimate of the true divergenceafter accounting for multiple hits. These correctionmethods are called nucleotide substitutionmodels and use parameters for DNA base frequencies and substitution rates to obtain a modified estimate of the divergence between two DNA sequences. The simplest of these is the Jukes and Cantor (1969) nucleotide-substitution model, named for its authors.

The three types of event that a single nucleotide site may experience over two generations.

probability of a nucleotide substitution is customarily represented by

probability of any substitution is

probability that a nucleotide stay the same(e.g G) one

generation later is

probability of no substitutions over two generations is

The probability that a nucleotide site retains its original base pair under the Jukes–Cantor model of nucleotidesubstitution.

probability of a multiple hit nucleotide substitution whichrestores the initial nucleotide

Probability that a nucleotide site hasthe same bp after two generations:

The change in the probability that a given nucleotideis found at a site over one generation is then

which then simplifies to

If we consider the rate of change at any time t

the term approaches zero so that

PG(t) approaches ¼

If the nucleotide at a site is initially a G ,then PG(t) = 1 and the probability the site remains a G over time is

is not initially a G then PG(t) = 0. The probabilitythe site remains a G over time is

two DNA sequences originally identical by descent at every nucleotide site at time 0, at some later time t the probability that any site will possess the same nucleotide is

The exponential term is now because there are two DNA sequences

The probability that two sites are different or divergent

–call it d – over time is one minusthe probability that sites are identical

natural logarithm of the right side

For two DNA sequences that were originallyidentical by descent, we expect that each site

has a 3αt chance of substitution since there are two sequences, there is a 6αt chance of a site being divergent between the two sequences

If we set expected divergence K = 6αt, then we notice K is close to the 8αt above. In fact, K is 3/4 of the expression for 8αt

Imagine two DNA sequences that differ at 1 site in 10 so the p distance is 10% or d = 0.10. This level of observeddivergence is an under-estimate because it does notaccount for multiple hits. To adjust for multiple hitswe compute corrected divergence as

which shows that at the low apparent divergenceof 10% there are expected to be 0.7% of sites thathad experienced multiple hits.

DNA polymorphism

Variable DNA sequences at one locus within a species represent different alleles that are present in the population.construct a multiple sequence alignment sothat the homologous nucleotide sites for each sequence are all lined up in the same columns

One measure of DNA polymorphism is the numberof segregating sites, S. A segregating site isany of the L nucleotide sites that maintains two ormore nucleotides within the population

by dividing the number of segregating sites by the total number of sites:

The number of segregating sites (S) under neutralityis a function of the scaled mutation rate 4Neμ.Watterson (1975) first developed a way to estimateθ from the number of segregating sites observed ina sample of DNA sequences. The expected number ofsegregating sites at drift–mutation equilibrium canmore easily be determined using the logic of the coalescent model

Under the infinite sites model of mutation, each mutation that occurs increases the number of segregating sites by one. The expected number of segregating sites is therefore just the expected number of mutations for a given genealogy

the expected number of mutations in one generation is kμ

If the expected time to coalescencefor k lineages is Tk, then kμTk mutations areexpected for each value of k.

The expected number of mutations is obtained by summing over all k between the present

(and the most recent common ancestor (MRCA

the probability of k lineages coalescing is

the expected time to coalescence is the inverse

expected number of segregating sites in a sample of n DNA sequences

Notice that θ = 4Neμ can be substituted in equation to give

and then rearranging

An estimate of the scaled mutation ratedetermined from the number of segregating sites in a sample of DNA sequences is symbolized as (Wfor

Watterson) or (S for segregating sites .)If

we define a new variable,

Then

using the absolute number of segregating sites

A second measure of DNA polymorphism is thenucleotide diversity in a sample of DNA sequences,symbolized by π (pronounced “pie”), and also known as the average pairwise differences in a sample of DNA sequences

The nucleotide diversity is the sum of the number of nucleotide differences seen for each pair of DNA sequences

where i and j are indices that refer to individualDNA sequences, dij is the number of nucleotide sitesthat differ between sequences i and j, and n is thetotal number of DNA sequences in the sample

In larger samples that may include multiple identicalDNA sequences, the nucleotide diversity can beestimated by

where pi and pj are the frequencies of alleles i and j,respectively, in a sample of k different sequences thateach represent one allele.

Estimates of nucleotide diversity are useful because π is a measure of heterozygosity for DNA sequences

Believe

You

can

Education

Presentation1population neutral theory