Upload
ignacio-gallo
View
159
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Talk given at Popgroup 46, Glasgow, December 2012
Citation preview
Popula'on gene'cs of gene func'on
Ignacio Gallo
Glasgow,
December 2012
Mo'va'on
“Molecular signatures of natural selec0on”, Nielsen 2005:
“inferences regarding the paAerns and distribu'on of selec'on in genes and genomes may provide important func'onal informa'on”
Wikipedia entry on “sta0s0cal thermodynamics”:
“The goal of sta's'cal thermodynamics is to understand and to interpret the measurable macroscopic proper'es of materials in terms of the proper'es of their cons'tuent par'cles and the interac'ons between them”
Mo'va'on
The func'onal importance of a gene'c sequence can be inferred by its popula'on distribu'on (for example from its degree of conserva'on).
Can anything more be said about the gene’s specific func'on (survival, reproduc'on, etc)?
distribu)on of genes gene func)on
macroscopic proper)es of materials
proper)es of their cons)tuent par)cles (size, speed, etc)
Moran model:
Model varia)on:
birth death
change in frequency for a given phenotype
€
p−
€
p+
€
q −
€
q +
€
1− p− − q −
€
p+
€
q +
We have two phenotypes, P1 and P2
€
€
p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2
If p - + q - < 1 in some intervals nothing happens.
If a death happens, a birth happens instantaneously (“musical chairs” process).
€
p−
€
p+
€
q −
€
q +
€
1− p− − q −
€
p+
€
q +
We have two phenotypes, P1 and P2
€
€
p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2
If p - + q - < 1 in some intervals nothing happens.
If a death happens, a birth happens instantaneously (“musical chairs” process).
loop
€
p−
€
p+
€
q −
€
q +
€
1− p− − q −
€
p+
€
q +
We have two phenotypes, P1 and P2
€
€
p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2
If p - + q - < 1 in some intervals nothing happens.
If a death happens, a birth happens instantaneously (“musical chairs” process).
loop
no loop
€
p−
€
p+
€
q −
€
q +
€
1− p− − q −
€
p+
€
q +
We have two phenotypes, P1 and P2
€
€
p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2
If p - + q - < 1 in some intervals nothing happens.
If a death happens, a birth happens instantaneously (“musical chairs” process).
loop
no loop
€
p−
€
p+
€
q −
€
q +
€
1− p− − q −
€
p+
€
q +
We differen'ate the phenotypes’ reproduc've fitness and life'mes independently, and consider reproduc)on and survival as two different func'ons.
€
W1 : offspring for type 1,W2 : offspring for type 2,⎧ ⎨ ⎩
T1 : average lifespan for type 1,T2 : average lifespan for type 2.⎧ ⎨ ⎩
reproduc,on
survival
€
u : mutation probability
and we are interested in the equilibrium distribu'on of a process with symmetric reversible muta)on for haploid individuals
€
q + =
W1
T1u x +
W2
T2(1−u)(1− x)
W1
T1x +W2
T2(1− x)
.€
q − =1− xT2
.
€
p+ =
W1
T1(1−u)x +
W2
T2u (1− x)
W1
T1x +W2
T2(1− x)
,€
p− =xT1,
Death probabili)es:
Birth probabili)es:
€
p−
€
p+
€
q −
€
q +
€
1− p− − q −
€
p+
€
q +
Variable x is the frequency of phenotype P1 (so frequency of P2 is 1 - x )
€
q + =
W1
T1u x +
W2
T2(1−u)(1− x)
W1
T1x +W2
T2(1− x)
.€
q − =1− xT2
.
€
p+ =
W1
T1(1−u)x +
W2
T2u (1− x)
W1
T1x +W2
T2(1− x)
,€
p− =xT1,
Death probabili)es:
Birth probabili)es:
€
p−
€
p+
€
q −
€
q +
€
1− p− − q −
€
p+
€
q +
€
⇒ p− + q − < 1
Variable x is the frequency of phenotype P1 (so frequency of P2 is 1 - x )
€
Nu N→∞⎯ → ⎯ ⎯ θ,
The model therefore depends on:
To get a non trivial distribu'on the following asympto'c constraints are imposed on the parameters:
€
θ (mutation), s (reproduction), λ (survival).
€
For notational convenience we also define λ =T1T2.
€
N W1
W2−1
⎛
⎝ ⎜
⎞
⎠ ⎟ u→ 0⎯ → ⎯ ⎯ s.
Asympto)c parameters
€
Nu N→∞⎯ → ⎯ ⎯ θ,
The model therefore depends on:
To get a non trivial distribu'on the following asympto'c constraints are imposed on the parameters:
€
θ (mutation), s (reproduction), λ (survival).
€
For notational convenience we also define λ =T1T2.
€
N W1
W2−1
⎛
⎝ ⎜
⎞
⎠ ⎟ u→ 0⎯ → ⎯ ⎯ s.
Asympto)c parameters
€
M = E[Xt +1 − Xt ] =1NT1
⋅θ λ2( 1− x )2 + λ s x ( 1− x ) −θ x2
x + λ (1− x),
V = E (Xt +1 − Xt )2[ ] =
2 λNT1
⋅x ( 1− x )x + λ (1− x)
.
€
M = E[Xt +1 − Xt ] =1NT1
⋅θ λ2( 1− x )2 + λ s x ( 1− x ) −θ x2
x + λ (1− x),
V = E (Xt +1 − Xt )2[ ] =
2 λNT1
⋅x ( 1− x )x + λ (1− x)
.
€
€
φ (x)= C⋅ 1V⋅ exp 2 M
Vdx∫
⎧ ⎨ ⎩
⎫ ⎬ ⎭ ,
€
φ (x)= Ceα x xλ θ −1 (1− x)θλ−1x + λ (1− x){ },
€
α = s +θ1λ− λ
⎛
⎝ ⎜
⎞
⎠ ⎟ .
which explicitly gives
where
The Wright equilibrium distribu'on for large N is
Equilibrium distribu)on
€
€
φ (x)= C⋅ 1V⋅ exp 2 M
Vdx∫
⎧ ⎨ ⎩
⎫ ⎬ ⎭ ,
€
φ (x)= Ceα x xλ θ −1 (1− x)θλ−1x + λ (1− x){ },
€
α = s +θ1λ− λ
⎛
⎝ ⎜
⎞
⎠ ⎟ .
which explicitly gives
where
The Wright equilibrium distribu'on for large N is
Equilibrium distribu)on
Typical shapes for equilibrium distribu)ons
rela've frequency of “blue” phenotypes
prob
ability den
sity
€
low mutation
rela've frequency of “blue” phenotypes
prob
ability den
sity
€
high mutation
rela've frequency of “blue” phenotypes
prob
ability den
sity
€
mutation probability close to 1N
Typical shapes for equilibrium distribu)ons
rela've frequency of “blue” phenotypes
prob
ability den
sity
€
low mutation
rela've frequency of “blue” phenotypes
prob
ability den
sity
€
high mutation
rela've frequency of “blue” phenotypes
prob
ability den
sity
€
mutation probability close to 1N
“u”: probability of muta'on per site
sta'
onary po
ints
€
λ =T1T2
=1 (life'mes are equal for the two phenotypes)
random driR muta'on/selec'on balance
€
1N
Sta)onary points
“u”: probability of muta'on per site
sta'
onary po
ints
€
λ =T1T2
=1 (life'mes are equal for the two phenotypes)
random driR muta'on/selec'on balance
€
1N
Sta)onary points
sta'
onary po
ints
€
λ =T1T2
=32
“u”: probability of muta'on per site €
1N
random driR muta'on/selec'on balance
“u”: probability of muta'on per site
sta'
onary po
ints
€
λ =T1T2
=1 (life'mes are equal for the two phenotypes)
random driR muta'on/selec'on balance
€
1N
Sta)onary points
random driR muta'on/selec'on balance
sta'
onary po
ints
€
λ =T1T2
=32
“u”: probability of muta'on per site €
1N
€
λ =T1T2
=32
€
1N
Sta)onary points
sta'
onary po
ints
€
λ =T1T2
=32
“u”: probability of muta'on per site €
1N
random driR muta'on/selec'on balance
€
1N
Typical shapes for equilibrium distribu)ons
rela've frequency of “blue” phenotypes
prob
ability den
sity
€
low mutation
rela've frequency of “blue” phenotypes
prob
ability den
sity
€
high mutation
rela've frequency of “blue” phenotypes
prob
ability den
sity
€
mutation probability close to 1N
The model includes one more parameter than the standard seTng, so it’s desirable expand the number of independent sta)s)cs.
This can be done considering the amount of synonymous varia'on included in each of our two phenotypes, and considering it neutral (as done by Nielsen and colleagues for various types of models).
The amount of synonymous varia'on can be quan'fied by using the inbreeding coefficient concept.
€
Inferring λ = T1T2
from population statistics
phenotypes P1 and P2
genotypes genotypes
Distribu)on of a gene throughout a popula)on (Kreitman 1983)
€
x = relative frequency of P1F1 = inbreeding coefficient for P1F2 = inbreeding coefficient for P2
Sta)s)cal quan))es
P1 P2
€
F1 /x +θ λ F1 /x − F1( ) + (L −1) F1[ ] − 1/x ≈ 0
F2 /(1− x) +θ1λF2 /(1− x) − F2( ) + (L −1) F2
⎡
⎣ ⎢ ⎤
⎦ ⎥ − 1/(1− x) ≈ 0
A result by Kimura and Crow gives that for only one phenotype
€
F ≈1
1+ 2θ
This can be extended to the case of two phenotypes to give two equa'ons:
where θ is the rescaled muta'on rate.
0.5 1 1.5 20.75 1.25 1.75
0.5
1
1.5
2
.75
1.25
1.75
Real
Estim
ated
T1 = 2 T2T1 = 1/2 T2
This (hideous) formula can be used to derive the value of
λ
from combined moments of quan''es x , F1 , F2
es'mated from a set of simulated realiza'ons of the process.
€
number of realizations per point =10000
€
running time = 5000 "generations"
€
simulation parameters :s = −5, θ = 7, N =1000, L (genotype length) = 40
following auxiliary quantities
R =�F2��F1�
· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,
Q1 =�F1/x��F1�
− 1, Q2 =�F2/x��F2�
− 1.
In terms of these quantities, the equation for λ takes the following form
R = λλQ1 + L− 1
Q2 + λ(L− 1),
and this relation leads to a quadratic equation that only admits one non-negative solution:
λ =1
2Q1
�(R− 1)(L− 1) +
�(R− 1)2(L− 1)2 + 4RQ1Q2
�. (4.5)
Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of
simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):
we see that the average values of such estimations are well aligned with the actual values.
The magnitude of the standard deviation for our estimations, on the other hand, is
considerable, especially in view of the fact the 10000 realisations of the process were used
to estimate each value of λ: it is clear that a substantial increase of efficiency will be
needed to make the theory relevant to actual empirical phenomena.
This practical consideration ought not to be allowed, however, to obfuscate the fact
that equation (4.5) provides a direct mathematical relation between combined moments
of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably
contains information about the function of a genetic sequence.
5 Outlook
We have shown that the effect of differentiating the lifetimes of two phenotypes inde-
pendently from their fertility includes a qualitative change in the equilibrium state of a
population: since survival and reproduction are quite distinct macro-functions performed
by any living organism, this may contribute to extend the population-genetical charac-
terisation of biological function.
26
following auxiliary quantities
R =�F2��F1�
· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,
Q1 =�F1/x��F1�
− 1, Q2 =�F2/x��F2�
− 1.
In terms of these quantities, the equation for λ takes the following form
R = λλQ1 + L− 1
Q2 + λ(L− 1),
and this relation leads to a quadratic equation that only admits one non-negative solution:
λ =1
2Q1
�(R− 1)(L− 1) +
�(R− 1)2(L− 1)2 + 4RQ1Q2
�. (4.5)
Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of
simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):
we see that the average values of such estimations are well aligned with the actual values.
The magnitude of the standard deviation for our estimations, on the other hand, is
considerable, especially in view of the fact the 10000 realisations of the process were used
to estimate each value of λ: it is clear that a substantial increase of efficiency will be
needed to make the theory relevant to actual empirical phenomena.
This practical consideration ought not to be allowed, however, to obfuscate the fact
that equation (4.5) provides a direct mathematical relation between combined moments
of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably
contains information about the function of a genetic sequence.
5 Outlook
We have shown that the effect of differentiating the lifetimes of two phenotypes inde-
pendently from their fertility includes a qualitative change in the equilibrium state of a
population: since survival and reproduction are quite distinct macro-functions performed
by any living organism, this may contribute to extend the population-genetical charac-
terisation of biological function.
26
following auxiliary quantities
R =�F2��F1�
· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,
Q1 =�F1/x��F1�
− 1, Q2 =�F2/x��F2�
− 1.
In terms of these quantities, the equation for λ takes the following form
R = λλQ1 + L− 1
Q2 + λ(L− 1),
and this relation leads to a quadratic equation that only admits one non-negative solution:
λ =1
2Q1
�(R− 1)(L− 1) +
�(R− 1)2(L− 1)2 + 4RQ1Q2
�. (4.5)
Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of
simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):
we see that the average values of such estimations are well aligned with the actual values.
The magnitude of the standard deviation for our estimations, on the other hand, is
considerable, especially in view of the fact the 10000 realisations of the process were used
to estimate each value of λ: it is clear that a substantial increase of efficiency will be
needed to make the theory relevant to actual empirical phenomena.
This practical consideration ought not to be allowed, however, to obfuscate the fact
that equation (4.5) provides a direct mathematical relation between combined moments
of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably
contains information about the function of a genetic sequence.
5 Outlook
We have shown that the effect of differentiating the lifetimes of two phenotypes inde-
pendently from their fertility includes a qualitative change in the equilibrium state of a
population: since survival and reproduction are quite distinct macro-functions performed
by any living organism, this may contribute to extend the population-genetical charac-
terisation of biological function.
26
Summary
• Playing with details of process is fun
• In principle “func'onal” parameter λ = T1 / T2 can be es'mated from “popula'on observables” x , F1 , F2
Thank you!
“Disclaimer”
…there is no such thing as the “func'on of a gene”, like there is no such thing as the “meaning of a word”.
For example, the word “Gene” can mean:
…but dic'onaries exist (and they are obsolete (but they’re a start (or maybe not))).
or