Upload
others
View
6
Download
1
Embed Size (px)
Citation preview
Evolutionary Genetics: Part 7
Recombination – Linkage Disequilibrium
S. peruvianum
S. chilense
Winter Semester 2012-2013
Prof Aurélien TellierFG Populationsgenetik
Color code
Color code:
Red = Important result or definition
Purple: exercise to do
Green: some bits of maths
Population genetics: 4 evolutionary forces
random genomic processes
(mutation, duplication, recombination, gene conversion)
natural
selection
random demographic
process (drift)
random spatial
process (migration)
molecular diversity
Recombination
Recombination and crossing over
Physical map
Genetic map
Independent segregation (Mendel’s law)
Non-independent segregation
This is genetic linkage
Non-independent segregation
� Recombination rate
� In general:
� The recombination rate of two loci on different chromosomes = 0.5
� The recombination rate between loci on same chromosome 0<ρ<0.5
� The recombination rate of two loci on the same chromosome increases
monotonically with distance
� BUT there are recombination hotspots (or cold spots) in the genome
number of recombined gametes
total number of gametesρ =
Non-independent segregation
Recombination and crossing-over
Genetic map length - Morgan
Model without recombination
A
B
Your
chromosomes
A
B
A
B
Inherited from
your mother
From your grandfather or
your grandmother
Inherited from
your father
Model with recombination
A
B
Your
chromosomes
A
B
A
b
a
B
Inherited from
your mother
From your
grandfather
From your
grandmother
Inherited from
your father
Model with recombination
� So two loci on the same chromosome can come
� From a single parent if there is no recombination
� From two parents if there is recombination
� With recombination, the chromosome of your parents are mosaics of
pieces of chromosomes from their parents
� We define ρ as the probability that a recombination event happens
P[two loci have the same parent] = 1-ρ
Model with recombination
� we define ρ as the probability that a recombination event happens
� P[two loci have the same parent] = 1-ρ
Coalescence with recombination
� Take one linage
� Tracing it back in time, recombination events can happen
� Recombination happens with probability ρ at every generation
P[recombination event t generation ago]=ρ(1-ρ)t-1
� This is again a geometric (exponential) distribution
� Backward in time:
� There can be
� coalescence of two lineages
� or recombination event
� recombination creates two lineages backward in time: one with locus A
and the other with locus B
Coalescence with recombination
� The number of lineages is increased by recombination, so it can take a while to
find the MRCA
� However, if the number of lineages increases (k), this will increase also the rate
of coalescence, so an MRCA will be found
Coalescence with recombination
� Along the genome, a serie of sites have a coalescent tree
� In fact, recombination slowly breaks link between sites
� The higher the recombination, the more independent are the loci
� Virtually, every locus has its own MRCA
� If recombination rates vary along the genome, this means that loci have
different recombination in their tree
Coalescence without recombination
� Along the genome, ONLY ONE tree for all loci
� The higher the recombination, the more independent are the loci
� Recombination is important, otherwise, each chromosome would be only one
data point (= one tree)
� This is the case for: Y-chromosome in humans, Mitochondrial DNA,
Chloroplast DNA where there is no recombination (= one tree for all loci)
� Why is this a problem if no recombination?
Coalescence without recombination
� Why is this a problem if no recombination?
� This is the case for: Y-chromosome in humans, Mitochondrial DNA,
Chloroplast DNA where there is no recombination (= one tree for all loci)
� Understanding the evolution in the genome requires to have independent
information about ONE evolutionary process (= different trees which come from
the same evolutionary scenario)
� Information comes from the variance between loci
� If all loci are linked, what is neutral evolution? If some genes are under
selection?
Coalescence with recombination
� How far along the genome do you have to go to find a recombination event?
� define r as the per site (bp) recombination rate
� if two sites are distant of d, the recombination rate ρ = rd
� the coalescence rate is 1/2N, we want at least 50% chance to have a
recombination event
P[recombination before coalescence] =
� this can be simplified as 4Nrd > 1 or d >1/4Nr
� For humans, Ne=104 and r= 10-8, we get d > 2500bp
� In Drosophila where Ne=106, the distance is 100 times shorter
2 11 0.5
2 1/ 2 4 1
rd
rd N Nrd= − ≥
+ +
Recombination and data
Linkage disequilibrium
Recombination in data: 4 gamete rule
� There is one rule to recognize if recombination happened
� the four gamete rule
� Did recombination happen on the right or on the left of the 2nd site?
Recombination in data: LD
� Linkage Disequilibrium (LD) is measured as D
� Two loci A and B with alleles A1 and A2, B1 and B2
� Frequencies are: A1B1 = p11 ; A1B2 = p12 ; A2B1 = p21 ; A2B2 =p22
Recombination in data: LD
� The A1B1 and A2B2 gametes are called coupling gametes
� The A1B2 and A1B2 gametes are called the repulsion gametes
� LD is a measure of the excess of coupling over repulsion gametes
� If D>0, there are more coupling gametes than expected at equilibrium
� If D<0, there are more repulsion gametes than expected
Recombination in data: LD
� Linkage Disequilibrium (LD)
Recombination in data: LD
Recombination in data: LD
� Linkage Disequilibrium (LD) is measured as D and r2
� The change in D in a single generation is: ∆D = –ρD
� After t generations:
� Dt = (1 –ρ)t D0
� This is again and again a geometric function of time
�
�This means that the ultimate state of the population is D=0
� BUT there is memory of LD in time
� LD decreases away from a given site in the genome also following a
geometric function
Recombination in data: haplotypes
� Linkage Disequilibrium (LD) can be seen in the presence of haplotypes
� Example: (Plos Genetics 2006)
� Do you expect long or short haplotypes under recombination?
� If genes can show different recombination rates, what does this
mean for haplotypes?
� Length and frequency of haplotypes are important signatures to
detect deviation from neutral evolution!!!
Recombination in data
� Using DnaSP
� Using the TNFSF5 and the droso files
� Look at the haplotypes ( Generate => Haplotype Data File)
� Why are haplotypes important to study recombination? What about the
infos on distance between sites?
� Can you look at recombination? Measure of LD, r2 and also the number
of four-gamete rule
� Use Analysis => Recombination
� Decay of LD from sites?