32
Popula’on gene’cs of gene func’on Ignacio Gallo Glasgow, December 2012

population genetics of gene function (talk)

Embed Size (px)

DESCRIPTION

Talk given at Popgroup 46, Glasgow, December 2012

Citation preview

Page 1: population genetics of gene function (talk)

Popula'on  gene'cs  of  gene  func'on  

Ignacio  Gallo  

Glasgow,  

December  2012  

Page 2: population genetics of gene function (talk)

Mo'va'on  

“Molecular  signatures  of  natural  selec0on”,  Nielsen  2005:  

“inferences  regarding  the  paAerns  and  distribu'on  of  selec'on  in  genes  and  genomes  may  provide  important  func'onal  informa'on”  

Wikipedia  entry  on  “sta0s0cal  thermodynamics”:  

“The  goal  of  sta's'cal  thermodynamics  is  to  understand  and  to  interpret  the  measurable  macroscopic  proper'es  of  materials  in  terms  of  the  proper'es  of  their  cons'tuent  par'cles  and  the  interac'ons  between  them”  

Page 3: population genetics of gene function (talk)

Mo'va'on  

The  func'onal  importance  of  a  gene'c  sequence  can  be  inferred  by  its  popula'on  distribu'on  (for  example  from  its  degree  of  conserva'on).  

Can  anything  more  be  said  about  the  gene’s  specific  func'on  (survival,  reproduc'on,  etc)?  

distribu)on  of  genes     gene  func)on    

macroscopic  proper)es  of  materials    

proper)es  of  their  cons)tuent  par)cles  (size,  speed,  etc)  

Page 4: population genetics of gene function (talk)

Moran  model:  

Model  varia)on:  

birth  death  

change  in  frequency  for  a  given  phenotype  

Page 5: population genetics of gene function (talk)

p−

p+

q −

q +

1− p− − q −

p+

q +

We  have  two  phenotypes,  P1  and  P2  

p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2

If      p - + q - < 1    in  some  intervals  nothing  happens.  

If  a  death  happens,  a  birth  happens  instantaneously  (“musical  chairs”  process).  

Page 6: population genetics of gene function (talk)

p−

p+

q −

q +

1− p− − q −

p+

q +

We  have  two  phenotypes,  P1  and  P2  

p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2

If      p - + q - < 1    in  some  intervals  nothing  happens.  

If  a  death  happens,  a  birth  happens  instantaneously  (“musical  chairs”  process).  

loop  

Page 7: population genetics of gene function (talk)

p−

p+

q −

q +

1− p− − q −

p+

q +

We  have  two  phenotypes,  P1  and  P2  

p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2

If      p - + q - < 1    in  some  intervals  nothing  happens.  

If  a  death  happens,  a  birth  happens  instantaneously  (“musical  chairs”  process).  

loop  

no  loop  

Page 8: population genetics of gene function (talk)

p−

p+

q −

q +

1− p− − q −

p+

q +

We  have  two  phenotypes,  P1  and  P2  

p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2

If      p - + q - < 1    in  some  intervals  nothing  happens.  

If  a  death  happens,  a  birth  happens  instantaneously  (“musical  chairs”  process).  

loop  

no  loop  

Page 9: population genetics of gene function (talk)

p−

p+

q −

q +

1− p− − q −

p+

q +

We  differen'ate  the  phenotypes’  reproduc've  fitness  and  life'mes  independently,  and  consider  reproduc)on  and  survival  as  two  different  func'ons.  

W1 : offspring for type 1,W2 : offspring for type 2,⎧ ⎨ ⎩

T1 : average lifespan for type 1,T2 : average lifespan for type 2.⎧ ⎨ ⎩

reproduc,on  

survival  

u : mutation probability

and  we  are  interested  in  the  equilibrium  distribu'on  of  a  process  with  symmetric  reversible  muta)on  for  haploid  individuals  

Page 10: population genetics of gene function (talk)

q + =

W1

T1u x +

W2

T2(1−u)(1− x)

W1

T1x +W2

T2(1− x)

.€

q − =1− xT2

.

p+ =

W1

T1(1−u)x +

W2

T2u (1− x)

W1

T1x +W2

T2(1− x)

,€

p− =xT1,

Death  probabili)es:  

Birth  probabili)es:  

p−

p+

q −

q +

1− p− − q −

p+

q +

Variable  x  is  the  frequency  of  phenotype  P1  (so  frequency  of  P2  is  1 - x )  

Page 11: population genetics of gene function (talk)

q + =

W1

T1u x +

W2

T2(1−u)(1− x)

W1

T1x +W2

T2(1− x)

.€

q − =1− xT2

.

p+ =

W1

T1(1−u)x +

W2

T2u (1− x)

W1

T1x +W2

T2(1− x)

,€

p− =xT1,

Death  probabili)es:  

Birth  probabili)es:  

p−

p+

q −

q +

1− p− − q −

p+

q +

⇒ p− + q − < 1

Variable  x  is  the  frequency  of  phenotype  P1  (so  frequency  of  P2  is  1 - x )  

Page 12: population genetics of gene function (talk)

Nu N→∞⎯ → ⎯ ⎯ θ,

The  model  therefore  depends  on:  

To   get   a   non   trivial   distribu'on   the   following   asympto'c   constraints  are  imposed  on  the  parameters:  

θ (mutation), s (reproduction), λ (survival).

For notational convenience we also define λ =T1T2.

N W1

W2−1

⎝ ⎜

⎠ ⎟ u→ 0⎯ → ⎯ ⎯ s.

Asympto)c  parameters  

Page 13: population genetics of gene function (talk)

Nu N→∞⎯ → ⎯ ⎯ θ,

The  model  therefore  depends  on:  

To   get   a   non   trivial   distribu'on   the   following   asympto'c   constraints  are  imposed  on  the  parameters:  

θ (mutation), s (reproduction), λ (survival).

For notational convenience we also define λ =T1T2.

N W1

W2−1

⎝ ⎜

⎠ ⎟ u→ 0⎯ → ⎯ ⎯ s.

Asympto)c  parameters  

Page 14: population genetics of gene function (talk)

M = E[Xt +1 − Xt ] =1NT1

⋅θ λ2( 1− x )2 + λ s x ( 1− x ) −θ x2

x + λ (1− x),

V = E (Xt +1 − Xt )2[ ] =

2 λNT1

⋅x ( 1− x )x + λ (1− x)

.

Page 15: population genetics of gene function (talk)

M = E[Xt +1 − Xt ] =1NT1

⋅θ λ2( 1− x )2 + λ s x ( 1− x ) −θ x2

x + λ (1− x),

V = E (Xt +1 − Xt )2[ ] =

2 λNT1

⋅x ( 1− x )x + λ (1− x)

.

Page 16: population genetics of gene function (talk)

φ (x)= C⋅ 1V⋅ exp 2 M

Vdx∫

⎧ ⎨ ⎩

⎫ ⎬ ⎭ ,

φ (x)= Ceα x xλ θ −1 (1− x)θλ−1x + λ (1− x){ },

α = s +θ1λ− λ

⎝ ⎜

⎠ ⎟ .

which  explicitly  gives  

where  

The  Wright  equilibrium  distribu'on  for  large  N  is    

Equilibrium  distribu)on  

Page 17: population genetics of gene function (talk)

φ (x)= C⋅ 1V⋅ exp 2 M

Vdx∫

⎧ ⎨ ⎩

⎫ ⎬ ⎭ ,

φ (x)= Ceα x xλ θ −1 (1− x)θλ−1x + λ (1− x){ },

α = s +θ1λ− λ

⎝ ⎜

⎠ ⎟ .

which  explicitly  gives  

where  

The  Wright  equilibrium  distribu'on  for  large  N  is    

Equilibrium  distribu)on  

Page 18: population genetics of gene function (talk)

Typical  shapes  for  equilibrium  distribu)ons

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

low mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

high mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

mutation probability close to 1N

Page 19: population genetics of gene function (talk)

Typical  shapes  for  equilibrium  distribu)ons

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

low mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

high mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

mutation probability close to 1N

Page 20: population genetics of gene function (talk)

“u”:    probability  of  muta'on  per  site  

sta'

onary  po

ints  

λ =T1T2

=1 (life'mes  are  equal  for  the  two    phenotypes)  

random  driR    muta'on/selec'on  balance  

1N

Sta)onary  points

Page 21: population genetics of gene function (talk)

“u”:    probability  of  muta'on  per  site  

sta'

onary  po

ints  

λ =T1T2

=1 (life'mes  are  equal  for  the  two    phenotypes)  

random  driR    muta'on/selec'on  balance  

1N

Sta)onary  points

sta'

onary  po

ints  

λ =T1T2

=32

“u”:    probability  of  muta'on  per  site  €

1N

random  driR    muta'on/selec'on  balance  

Page 22: population genetics of gene function (talk)

“u”:    probability  of  muta'on  per  site  

sta'

onary  po

ints  

λ =T1T2

=1 (life'mes  are  equal  for  the  two    phenotypes)  

random  driR    muta'on/selec'on  balance  

1N

Sta)onary  points

random  driR    muta'on/selec'on  balance  

sta'

onary  po

ints  

λ =T1T2

=32

“u”:    probability  of  muta'on  per  site  €

1N

λ =T1T2

=32

1N

Page 23: population genetics of gene function (talk)

Sta)onary  points

sta'

onary  po

ints  

λ =T1T2

=32

“u”:    probability  of  muta'on  per  site  €

1N

random  driR    muta'on/selec'on  balance  

1N

Page 24: population genetics of gene function (talk)

Typical  shapes  for  equilibrium  distribu)ons

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

low mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

high mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

mutation probability close to 1N

Page 25: population genetics of gene function (talk)

The   model   includes   one   more   parameter   than   the   standard   seTng,     so   it’s  desirable  expand  the  number  of  independent  sta)s)cs.  

This  can  be  done  considering  the  amount  of  synonymous  varia'on  included  in  each  of  our  two  phenotypes,  and  considering  it  neutral  (as  done  by  Nielsen  and  colleagues  for  various  types  of  models).  

The  amount  of  synonymous  varia'on  can  be  quan'fied  by  using  the  inbreeding  coefficient  concept.  

Inferring λ = T1T2

from population statistics

Page 26: population genetics of gene function (talk)

phenotypes  P1  and  P2  

genotypes  genotypes  

Distribu)on  of  a  gene  throughout  a  popula)on  (Kreitman  1983)  

Page 27: population genetics of gene function (talk)

x = relative frequency of P1F1 = inbreeding coefficient for P1F2 = inbreeding coefficient for P2

Sta)s)cal  quan))es

P1  P2  

Page 28: population genetics of gene function (talk)

F1 /x +θ λ F1 /x − F1( ) + (L −1) F1[ ] − 1/x ≈ 0

F2 /(1− x) +θ1λF2 /(1− x) − F2( ) + (L −1) F2

⎣ ⎢ ⎤

⎦ ⎥ − 1/(1− x) ≈ 0

A  result  by  Kimura  and  Crow  gives  that  for  only  one  phenotype  

F ≈1

1+ 2θ

This  can  be  extended  to  the  case  of  two  phenotypes  to  give  two  equa'ons:  

where  θ  is  the  rescaled  muta'on  rate.  

Page 29: population genetics of gene function (talk)

0.5 1 1.5 20.75 1.25 1.75

0.5

1

1.5

2

.75

1.25

1.75

Real

Estim

ated

T1 = 2 T2T1 = 1/2 T2

This   (hideous)   formula  can  be  used  to  derive  the  value  of    

λ    

from  combined  moments  of  quan''es    x ,      F1 ,      F2    

es'mated   from   a   set   of   simulated  realiza'ons  of  the  process.  

number of realizations per point =10000

running time = 5000 "generations"

simulation parameters :s = −5, θ = 7, N =1000, L (genotype length) = 40

following auxiliary quantities

R =�F2��F1�

· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,

Q1 =�F1/x��F1�

− 1, Q2 =�F2/x��F2�

− 1.

In terms of these quantities, the equation for λ takes the following form

R = λλQ1 + L− 1

Q2 + λ(L− 1),

and this relation leads to a quadratic equation that only admits one non-negative solution:

λ =1

2Q1

�(R− 1)(L− 1) +

�(R− 1)2(L− 1)2 + 4RQ1Q2

�. (4.5)

Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of

simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):

we see that the average values of such estimations are well aligned with the actual values.

The magnitude of the standard deviation for our estimations, on the other hand, is

considerable, especially in view of the fact the 10000 realisations of the process were used

to estimate each value of λ: it is clear that a substantial increase of efficiency will be

needed to make the theory relevant to actual empirical phenomena.

This practical consideration ought not to be allowed, however, to obfuscate the fact

that equation (4.5) provides a direct mathematical relation between combined moments

of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably

contains information about the function of a genetic sequence.

5 Outlook

We have shown that the effect of differentiating the lifetimes of two phenotypes inde-

pendently from their fertility includes a qualitative change in the equilibrium state of a

population: since survival and reproduction are quite distinct macro-functions performed

by any living organism, this may contribute to extend the population-genetical charac-

terisation of biological function.

26

following auxiliary quantities

R =�F2��F1�

· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,

Q1 =�F1/x��F1�

− 1, Q2 =�F2/x��F2�

− 1.

In terms of these quantities, the equation for λ takes the following form

R = λλQ1 + L− 1

Q2 + λ(L− 1),

and this relation leads to a quadratic equation that only admits one non-negative solution:

λ =1

2Q1

�(R− 1)(L− 1) +

�(R− 1)2(L− 1)2 + 4RQ1Q2

�. (4.5)

Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of

simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):

we see that the average values of such estimations are well aligned with the actual values.

The magnitude of the standard deviation for our estimations, on the other hand, is

considerable, especially in view of the fact the 10000 realisations of the process were used

to estimate each value of λ: it is clear that a substantial increase of efficiency will be

needed to make the theory relevant to actual empirical phenomena.

This practical consideration ought not to be allowed, however, to obfuscate the fact

that equation (4.5) provides a direct mathematical relation between combined moments

of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably

contains information about the function of a genetic sequence.

5 Outlook

We have shown that the effect of differentiating the lifetimes of two phenotypes inde-

pendently from their fertility includes a qualitative change in the equilibrium state of a

population: since survival and reproduction are quite distinct macro-functions performed

by any living organism, this may contribute to extend the population-genetical charac-

terisation of biological function.

26

following auxiliary quantities

R =�F2��F1�

· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,

Q1 =�F1/x��F1�

− 1, Q2 =�F2/x��F2�

− 1.

In terms of these quantities, the equation for λ takes the following form

R = λλQ1 + L− 1

Q2 + λ(L− 1),

and this relation leads to a quadratic equation that only admits one non-negative solution:

λ =1

2Q1

�(R− 1)(L− 1) +

�(R− 1)2(L− 1)2 + 4RQ1Q2

�. (4.5)

Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of

simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):

we see that the average values of such estimations are well aligned with the actual values.

The magnitude of the standard deviation for our estimations, on the other hand, is

considerable, especially in view of the fact the 10000 realisations of the process were used

to estimate each value of λ: it is clear that a substantial increase of efficiency will be

needed to make the theory relevant to actual empirical phenomena.

This practical consideration ought not to be allowed, however, to obfuscate the fact

that equation (4.5) provides a direct mathematical relation between combined moments

of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably

contains information about the function of a genetic sequence.

5 Outlook

We have shown that the effect of differentiating the lifetimes of two phenotypes inde-

pendently from their fertility includes a qualitative change in the equilibrium state of a

population: since survival and reproduction are quite distinct macro-functions performed

by any living organism, this may contribute to extend the population-genetical charac-

terisation of biological function.

26

Page 30: population genetics of gene function (talk)

Summary  

•  Playing  with  details  of  process  is  fun  

•  In  principle  “func'onal”  parameter        λ  =  T1 / T2  can  be  es'mated  from  “popula'on  observables”      x ,      F1 ,      F2  

Page 31: population genetics of gene function (talk)

Thank  you!  

Page 32: population genetics of gene function (talk)

“Disclaimer”  

…there  is  no  such  thing  as  the  “func'on  of  a  gene”,  like  there  is  no  such  thing  as  the  “meaning  of  a  word”.  

For  example,  the  word  “Gene”  can  mean:  

…but  dic'onaries  exist  (and  they  are  obsolete  (but  they’re  a  start  (or  maybe  not))).  

or