population genetics of gene function (talk)

Popula'on gene'cs of gene func'on

Ignacio Gallo

Glasgow,

December 2012

Mo'va'on

“Molecular signatures of natural selec0on”, Nielsen 2005:

“inferences regarding the paAerns and distribu'on of selec'on in genes and genomes may provide important func'onal informa'on”

Wikipedia entry on “sta0s0cal thermodynamics”:

“The goal of sta's'cal thermodynamics is to understand and to interpret the measurable macroscopic proper'es of materials in terms of the proper'es of their cons'tuent par'cles and the interac'ons between them”

Mo'va'on

The func'onal importance of a gene'c sequence can be inferred by its popula'on distribu'on (for example from its degree of conserva'on).

Can anything more be said about the gene’s specific func'on (survival, reproduc'on, etc)?

distribu)on of genes gene func)on

macroscopic proper)es of materials

proper)es of their cons)tuent par)cles (size, speed, etc)

Moran model:

Model varia)on:

birth death

change in frequency for a given phenotype

€

p−

€

p+

€

q −

€

q +

€

1− p− − q −

€

p+

€

q +

We have two phenotypes, P1 and P2

€

€

p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2

If p - + q - < 1 in some intervals nothing happens.

If a death happens, a birth happens instantaneously (“musical chairs” process).

€

p−

€

p+

€

q −

€

q +

€

1− p− − q −

€

p+

€

q +


€

€




loop

€

p−

€

p+

€

q −

€

q +

€

1− p− − q −

€

p+

€

q +


€

€




loop

no loop

€

p−

€

p+

€

q −

€

q +

€

1− p− − q −

€

p+

€

q +


€

€




loop

no loop

€

p−

€

p+

€

q −

€

q +

€

1− p− − q −

€

p+

€

q +

We differen'ate the phenotypes’ reproduc've fitness and life'mes independently, and consider reproduc)on and survival as two different func'ons.

€

W1 : offspring for type 1,W2 : offspring for type 2,⎧ ⎨ ⎩

T1 : average lifespan for type 1,T2 : average lifespan for type 2.⎧ ⎨ ⎩

reproduc,on

survival

€

u : mutation probability

and we are interested in the equilibrium distribu'on of a process with symmetric reversible muta)on for haploid individuals

€

q + =

W1

T1u x +

W2

T2(1−u)(1− x)

W1

T1x +W2

T2(1− x)

.€

q − =1− xT2

.

€

p+ =

W1

T1(1−u)x +

W2

T2u (1− x)

W1

T1x +W2

T2(1− x)

,€

p− =xT1,

Death probabili)es:

Birth probabili)es:

€

p−

€

p+

€

q −

€

q +

€

1− p− − q −

€

p+

€

q +

Variable x is the frequency of phenotype P1 (so frequency of P2 is 1 - x )

€

q + =

W1

T1u x +

W2

T2(1−u)(1− x)

W1

T1x +W2

T2(1− x)

.€

q − =1− xT2

.

€

p+ =

W1

T1(1−u)x +

W2

T2u (1− x)

W1

T1x +W2

T2(1− x)

,€

p− =xT1,

Death probabili)es:

Birth probabili)es:

€

p−

€

p+

€

q −

€

q +

€

1− p− − q −

€

p+

€

q +

€

⇒ p− + q − < 1

Variable x is the frequency of phenotype P1 (so frequency of P2 is 1 - x )

€

Nu N→∞⎯ → ⎯ ⎯ θ,

The model therefore depends on:

To get a non trivial distribu'on the following asympto'c constraints are imposed on the parameters:

€

θ (mutation), s (reproduction), λ (survival).

€

For notational convenience we also define λ =T1T2.

€

N W1

W2−1

⎛

⎝ ⎜

⎞

⎠ ⎟ u→ 0⎯ → ⎯ ⎯ s.

Asympto)c parameters

€

Nu N→∞⎯ → ⎯ ⎯ θ,

The model therefore depends on:

To get a non trivial distribu'on the following asympto'c constraints are imposed on the parameters:

€

θ (mutation), s (reproduction), λ (survival).

€

For notational convenience we also define λ =T1T2.

€

N W1

W2−1

⎛

⎝ ⎜

⎞

⎠ ⎟ u→ 0⎯ → ⎯ ⎯ s.

Asympto)c parameters

€

M = E[Xt +1 − Xt ] =1NT1

⋅θ λ2( 1− x )2 + λ s x ( 1− x ) −θ x2

x + λ (1− x),

V = E (Xt +1 − Xt )2[ ] =

2 λNT1

⋅x ( 1− x )x + λ (1− x)

.

€

M = E[Xt +1 − Xt ] =1NT1

⋅θ λ2( 1− x )2 + λ s x ( 1− x ) −θ x2

x + λ (1− x),

V = E (Xt +1 − Xt )2[ ] =

2 λNT1

⋅x ( 1− x )x + λ (1− x)

.

€

€

φ (x)= C⋅ 1V⋅ exp 2 M

Vdx∫

⎧ ⎨ ⎩

⎫ ⎬ ⎭ ,

€

φ (x)= Ceα x xλ θ −1 (1− x)θλ−1x + λ (1− x){ },

€

α = s +θ1λ− λ

⎛

⎝ ⎜

⎞

⎠ ⎟ .

which explicitly gives

where

The Wright equilibrium distribu'on for large N is

Equilibrium distribu)on

€

€

φ (x)= C⋅ 1V⋅ exp 2 M

Vdx∫

⎧ ⎨ ⎩

⎫ ⎬ ⎭ ,

€

φ (x)= Ceα x xλ θ −1 (1− x)θλ−1x + λ (1− x){ },

€

α = s +θ1λ− λ

⎛

⎝ ⎜

⎞

⎠ ⎟ .

which explicitly gives

where

The Wright equilibrium distribu'on for large N is

Equilibrium distribu)on

Typical shapes for equilibrium distribu)ons

rela've frequency of “blue” phenotypes

prob

ability den

sity

€

low mutation


prob

ability den

sity

€

high mutation


prob

ability den

sity

€

mutation probability close to 1N



prob

ability den

sity

€

low mutation


prob

ability den

sity

€

high mutation


prob

ability den

sity

€


“u”: probability of muta'on per site

sta'

onary po

ints

€

λ =T1T2

=1 (life'mes are equal for the two phenotypes)

random driR muta'on/selec'on balance

€

1N

Sta)onary points


sta'

onary po

ints

€

λ =T1T2



€

1N

Sta)onary points

sta'

onary po

ints

€

λ =T1T2

=32

“u”: probability of muta'on per site €

1N



sta'

onary po

ints

€

λ =T1T2



€

1N

Sta)onary points


sta'

onary po

ints

€

λ =T1T2

=32


1N

€

λ =T1T2

=32

€

1N

Sta)onary points

sta'

onary po

ints

€

λ =T1T2

=32


1N


€

1N



prob

ability den

sity

€

low mutation


prob

ability den

sity

€

high mutation


prob

ability den

sity

€


The model includes one more parameter than the standard seTng, so it’s desirable expand the number of independent sta)s)cs.

This can be done considering the amount of synonymous varia'on included in each of our two phenotypes, and considering it neutral (as done by Nielsen and colleagues for various types of models).

The amount of synonymous varia'on can be quan'fied by using the inbreeding coefficient concept.

€

Inferring λ = T1T2

from population statistics

phenotypes P1 and P2

genotypes genotypes

Distribu)on of a gene throughout a popula)on (Kreitman 1983)

€

x = relative frequency of P1F1 = inbreeding coefficient for P1F2 = inbreeding coefficient for P2

Sta)s)cal quan))es

P1 P2

€

F1 /x +θ λ F1 /x − F1( ) + (L −1) F1[ ] − 1/x ≈ 0

F2 /(1− x) +θ1λF2 /(1− x) − F2( ) + (L −1) F2

⎡

⎣ ⎢ ⎤

⎦ ⎥ − 1/(1− x) ≈ 0

A result by Kimura and Crow gives that for only one phenotype

€

F ≈1

1+ 2θ

This can be extended to the case of two phenotypes to give two equa'ons:

where θ is the rescaled muta'on rate.

0.5 1 1.5 20.75 1.25 1.75

0.5

1

1.5

2

.75

1.25

1.75

Real

Estim

ated

T1 = 2 T2T1 = 1/2 T2

This (hideous) formula can be used to derive the value of

λ

from combined moments of quan''es x , F1 , F2

es'mated from a set of simulated realiza'ons of the process.

€

number of realizations per point =10000

€

running time = 5000 "generations"

€

simulation parameters :s = −5, θ = 7, N =1000, L (genotype length) = 40

following auxiliary quantities

R =�F2��F1�

· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,

Q1 =�F1/x��F1�

− 1, Q2 =�F2/x��F2�

− 1.

In terms of these quantities, the equation for λ takes the following form

R = λλQ1 + L− 1

Q2 + λ(L− 1),

and this relation leads to a quadratic equation that only admits one non-negative solution:

λ =1

2Q1

�(R− 1)(L− 1) +

�(R− 1)2(L− 1)2 + 4RQ1Q2

�. (4.5)

Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of

simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):

we see that the average values of such estimations are well aligned with the actual values.

The magnitude of the standard deviation for our estimations, on the other hand, is

considerable, especially in view of the fact the 10000 realisations of the process were used

to estimate each value of λ: it is clear that a substantial increase of efficiency will be

needed to make the theory relevant to actual empirical phenomena.

This practical consideration ought not to be allowed, however, to obfuscate the fact

that equation (4.5) provides a direct mathematical relation between combined moments

of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably

contains information about the function of a genetic sequence.

5 Outlook

We have shown that the effect of differentiating the lifetimes of two phenotypes inde-

pendently from their fertility includes a qualitative change in the equilibrium state of a

population: since survival and reproduction are quite distinct macro-functions performed

by any living organism, this may contribute to extend the population-genetical charac-

terisation of biological function.

26


R =�F2��F1�

· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,

Q1 =�F1/x��F1�

− 1, Q2 =�F2/x��F2�

− 1.


R = λλQ1 + L− 1

Q2 + λ(L− 1),


λ =1

2Q1

�(R− 1)(L− 1) +

�(R− 1)2(L− 1)2 + 4RQ1Q2

�. (4.5)












5 Outlook






26


R =�F2��F1�

· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,

Q1 =�F1/x��F1�

− 1, Q2 =�F2/x��F2�

− 1.


R = λλQ1 + L− 1

Q2 + λ(L− 1),


λ =1

2Q1

�(R− 1)(L− 1) +

�(R− 1)2(L− 1)2 + 4RQ1Q2

�. (4.5)












5 Outlook






26

Summary

•  Playing with details of process is fun

•  In principle “func'onal” parameter λ = T1 / T2 can be es'mated from “popula'on observables” x , F1 , F2

Thank you!

“Disclaimer”

…there is no such thing as the “func'on of a gene”, like there is no such thing as the “meaning of a word”.

For example, the word “Gene” can mean:

…but dic'onaries exist (and they are obsolete (but they’re a start (or maybe not))).

or

Education

population genetics of gene function (talk)