59
Population Genetics, Lecture 2 Nancy Lim Saccone [email protected] Bio 5488, Spring 2019 Wednesday 3/20/19 (with thanks to Don Conrad and slides from past years)

Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone [email protected] Bio 5488, Spring

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Population Genetics, Lecture 2

Nancy Lim Saccone

[email protected]

Bio 5488, Spring 2019

Wednesday 3/20/19

(with thanks to Don Conrad and slides from past years)

Page 2: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Outline for today

• Genetic drift & decay of heterozygosity, revisited

• Mutation

• Coalescent • Mutation

• Linkage disequilibrium

Page 3: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Recall:

Define Gt = homozygosity at generation t

= probability that a random draw of 2

chromosomes from the pop results in

2 of the same allele

Same recursion formula holds whether "same

allele" means identical by descent or identical

by state

Page 4: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Decay of HeterozygosityTwo ways to get 2 of the same allele:

Identical by

descent

Generation t Generation t+1

Probability

= 1

2𝑁

Generation t Generation t+1

Probability

= 1 −1

2𝑁* Gt

Therefore 𝐺𝑡+1 =1

2𝑁+ 1 −

1

2𝑁∗ 𝐺𝑡

Page 5: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Also true if define Ft = Prob picking 2 chromosomes and they

have the same allele identical by descent

Identical by

descent

Generation t Generation t+1

Probability

= 1

2𝑁

Generation t Generation t+1

Probability

= 1 −1

2𝑁* Ft

Therefore 𝐹𝑡+1 =1

2𝑁+ 1 −

1

2𝑁∗ 𝐹𝑡

Page 6: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

MutationGenetic drift & decay of heterozygosity variation is removed

from the population

Mutation restores genetic variation

Neutral theory: most of the DNA sequence differences within a

population are due to neutral mutations.

Page 7: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

MutationLet m = mutation rate to neutral alleles (per bp per generation)

(Sometimes u stands in for m)

Recall 𝐺𝑡+1 =1

2𝑁+ 1 −

1

2𝑁∗ 𝐺𝑡

If now allow mutation:

After 1 round with mutation possible:

𝐺𝑡+1 = (1 − 𝜇)21

2𝑁+ 1 −

1

2𝑁∗ 𝐺𝑡

At "equilibrium," Gt = Gt+1

Page 8: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

MutationClaim: at equilibrium, probability that 2 alleles drawn at random

are identical is (essentially)1

1 + 4𝑁𝜇

Proof: 𝐺𝑒𝑞 = (1 − 𝜇)21

2𝑁+ 1 −

1

2𝑁∗ 𝐺𝑒𝑞

𝐺𝑒𝑞 1 − 1 −1

2𝑁(1 − 𝜇)2 =

1

2𝑁(1 − 𝜇)2

𝐺𝑒𝑞 =

12𝑁

(1 − 𝜇)2

1 − 1 −12𝑁

(1 − 𝜇)2

≈1 − 2𝜇

2𝑁 1 − 1 −12𝑁

1 − 2𝜇

=1 − 2𝜇

1 + 4𝑁𝜇 − 2𝜇

≈1

1 + 4𝑁𝜇

Use (1-m)2 ~ 1-2m

Use 2m very small

Page 9: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Mutation4Nm comes up repeatedly in population genetics; often

referred to as theta:

𝜃 = 4𝑁𝜇(Different from recombination rate 𝜃! Pop geneticists often use

"r" for recombination rate.)

At equilibrium (with drift and mutation rate m), the probability

that 2 alleles drawn at random are the same is1

1 + 4𝑁𝜇=

1

1 + 𝜃

Expected heterozygosity at equilibrium:

𝐻𝑒𝑞 = 1 − 𝐺𝑒𝑞 = 1 −1

1 + 𝜃=

𝜃

1 + 𝜃

Page 10: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

The coalescent process

• "backward in time" process

• Lineage of alleles in a

sample traced backward in

time to their common

ancestor allele

• Genealogies are

unobserved, but can be

estimated

• Conceptual framework for

population genetic

inference: mutation,

recombination,

demographic history

• Kingman, Tajima, Hudson

Page 11: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

2 sample (item) coalescent

N = population size of diploid

individuals

n = sample size of haploid

chromosomes

MRCA = most recent common

ancestor

T2 = coalescence time for 2

chromosomes

Page 12: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

2 sample (item) coalescent

Prob that the time of MRCA is

t generations ago:

• 𝑃 𝑇2 = 𝑡 = 1 −1

2𝑁

𝑡−1 1

2𝑁

Did not coalesce

for first t-1

generationsCoalesced at t

Approximate (as N ∞):

• 𝑃 𝑇2 = 𝑡 =1

2𝑁𝑒−

1

2𝑁𝑡

Geometric distribution p(x) = (1-p)x-1 p

Has expected value or mean 1/p, so here E(T2) = 2N

Page 13: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

2 sample (item) coalescent

E(T2) = 2N

In "coalescent units" let t' = t/2N,

then E(T2) = 1

Page 14: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

n-coalescent

Have 𝑛2

=𝑛(𝑛−1)

2possible pairs

that could coalesce.

Analogous to 2-item

approximation:

• 𝑃 𝑇𝑛 = 𝑡 =𝑛22𝑁

𝑒−𝑛22𝑁

𝑡

𝐸 𝑇𝑛 =2𝑁𝑛2

=2𝑁

𝑛(𝑛 − 1)/2

Seq 1 Seq 2 ……..In coalescent units,

𝐸 𝑇𝑛 =1𝑛2

=1

𝑛!𝑛 − 2 ! 2!

=2

𝑛(𝑛 − 1)

Page 15: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

n-coalescent

Mean elapsed time in coalescent

units (*2N)

1

1/3

1/6

1/10

Seq 1 Seq 2 ……..

T2

T3

T4

T5

E(TMRCA for n chromosomes) = T2 + T3 +T4 + … + Tn

= 2(1-1/n) coalescent units

In coalescent units,

𝐸 𝑇𝑛 =1𝑛2

=2

𝑛(𝑛 − 1)

Page 16: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Adding mutations

For neutral models, can separately model the genealogical process

(the tree) and the mutation process

- Infinite sites mutation model:

Each mutation, when it occurs, affects a different nucleotide site

(one that was previously unaffected by mutation)

Page 17: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

What is the expected number of mutations

between 2 chromosomes?

𝜇 = mutation rate per bp per generation

𝜋 = # of sequence changes btwn 2 chrs

Recall 𝐸 𝑡 = 2𝑁

Then 𝐸 𝜋 = 2𝜇𝐸 𝑡 = 4𝑁𝜇 = 𝜃

Theta makes an appearance again

Page 18: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Expected number of segregating sites in a sample of n chrs

TOTAL time in the tree for a sample of 4

chromosomes is

L = 4T4 + 3T3 + 2T2

In general, for sample size n,

𝐿 =

𝑖=1

𝑛

𝑖 ∗ 𝑇𝑖

Hence

𝐸 𝐿 =

𝑖=1

𝑛

𝑖 ∗ 𝐸 𝑇𝑖 =

𝑖=1

𝑛

𝑖 ∗2𝑁

𝑖2

= σ𝑖=1𝑛 𝑖 ∗

2𝑁𝑖(𝑖−1)

2

= 4𝑁σ𝑖=1𝑛 1

𝑖

𝐸 𝑆 = μ𝐸 𝐿 = 4𝑁𝜇 σ𝑖=1𝑛 1

𝑖= 𝜃σ𝑖=1

𝑛 1

𝑖

T2

T3

T4

Page 19: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Expected number of segregating sites in a sample of n chrs

𝐸 𝑆 = 𝜃

𝑖=1

𝑛1

𝑖

Hence can estimate 𝜃 from the observed

number of segregating sites:

Watterson Estimator:

𝜃 𝑊 =𝑆

σ𝑖=1𝑛 1

𝑖

∃ other estimators of theta

T2

T3

T4

Page 20: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Draw 2 chromosomes at random. What is the probability that they are

different?

Looking backwards in time, hit one of 3 events:

Mutation on one chr

Mutation on other chr

Coalescence ("before" any mutations)

Pr(coalescence at time t, before mutation) = 1 −1

2𝑁

𝑡−1 1

2𝑁1 − 𝜇 2𝑡

Pr(mutation before coalescence) = 1 −1

2𝑁

𝑡2𝜇 1 − 𝜇 2𝑡−1

1 −12𝑁

𝑡

2𝜇 1 − 𝜇 2𝑡−1

1 −12𝑁

𝑡−1 12𝑁

1 − 𝜇 2𝑡 + 1 −12𝑁

𝑡

2𝜇 1 − 𝜇 2𝑡−1

Mutation could occur in either lineage

t generations, no coalescence No mutations other generations

Coalescence at tt-1 generations, no coalescence No mutations in 2t

opportunities

Page 21: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

1 −12𝑁

𝑡

2𝜇 1 − 𝜇 2𝑡−1

1 −12𝑁

𝑡−1 12𝑁

1 − 𝜇 2𝑡 + 1 −12𝑁

𝑡

2𝜇 1 − 𝜇 2𝑡−1

=

=1 −

12𝑁

𝑡−1

1 − 𝜇 2𝑡−1 1 −12𝑁

2𝜇

1 −12𝑁

𝑡−1

1 − 𝜇 2𝑡−1 1 −12𝑁

2𝜇 +12𝑁

1 − 𝜇

=

2𝑁 − 12𝑁

2𝜇

2𝑁 − 12𝑁

2𝜇 +12𝑁

1 − 𝜇=

4𝑁𝜇 − 2𝜇

4𝑁𝜇 + 1 − 3𝜇

≈𝜃

𝜃 + 1

Page 22: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

• "non-random associations between alleles at different loci"

• Contrast with HWE: HWE relates to alleles A and a at the same locus

• LD statistics quantify Pr(AB haplotype) compares to Pr(A)*Pr(B) at different loci

• Important in the design and interpretation of disease mapping studies

Page 23: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Mapping disease genes• Some quick background

• Linkage

• quantify co-segregation of trait and genotype in families

• Association

• Common design: case-control sample, analyzed for allele frequency differences

cases controls

ACAC

AC

AC

AA AC

CC

AC

AC

CC

CC

CC

CC

AC

AA

AC

AA

AA

AA

LOD score traditionally

used to measure statistical

evidence for linkage

Page 24: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Association in a case-control sample

Let N = Ncase + Ncontrol . (2N observations of alleles)

Most basic test for biallelic markers: compare allele frequencies in cases vs controls in a 2x2 table.

N11 N12

N21 N22

2N

case ctrl

A1

A2

Chi-square with n-1 df (n

= # of alleles)

c2 = S(obs - exp)2

exp

Page 25: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Association in a case-control sample

Alternatives: logistic regression

Let P = probability of being a case.

Log(P/(1-P)) = a0 + (a1x1 + … + amxm) + b1G

xi are covariates (e.g. gender, age)

G represents genotype (0, 1 or 2 copies of a specified allele)

(corresponds to a log-additive, that is, multiplicative model).

Statistical test: determine the improvement in fit when the genotype term is added. (Likelihood ratio chi-square).

Page 26: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

Note: the above tests should work great if the marker you

genotyped is actually the disease locus.

What if the marker is "nearby" or "correlated" with the disease

locus?

Here the concept of “linkage disequilibrium” (LD) comes in.

The International Hap Map Project / 1000 Genomes

• goal: determine the common patterns of DNA sequence

variation (LD among SNPs) in human populations

• Identifies redundancy among SNPs for more efficient

disease mapping and pharmacogenetics studies

Page 27: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Human DNA sequence variation

How to measure/describe "patterns" of DNA sequence

variation?

How to use these patterns to find disease genes that affect

phenotypes?

Page 28: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Human Sequence Variation

ancestral

chromosome

present day

chromosomes:

alleles on the preserved "ancestral background" tend to

be in linkage disequilibrium (LD)

Page 29: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium (LD) involves haplotype

frequencies.

Focus on pair-wise LD, SNP markers

Genotypes do not necessarily determine haplotypes:

Consider 2-locus genotype A1 A2 B1 B2 .

Two possible phases :

Page 30: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium (LD) involves haplotype

frequencies

Focus on pair-wise LD, SNP markers

Genotypes do not necessarily determine haplotypes:

Consider 2-locus genotype A1 A2 B1 B2 .

Two possible phases :

A1

B1

A2

B2

A1

B2

A2

B1

Page 31: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

Linkage Disequilibrium (LD), aka allelic association:

For two loci A and B:

LD is said to exist when alleles at A and B tend to co-

occur on haplotypes in proportions different than

would be expected under statistical independence.

Page 32: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

• How to formally measure LD between alleles at 2 loci?

Page 33: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

Example: Consider 2 SNPs:

SNP 1: A 50% C 50%

SNP 2: A 50% G 50%

snp1 snp2 expected freq

4 possible haplotypes: A A 0.5 * 0.5

A G 0.5 * 0.5

C A 0.5 * 0.5

C G 0.5 * 0.5

But perhaps in your sample you observe only the following:

A A C C A T A T C ... C G A T T ...

and

A A C C C T A T C ... C A A T T ...

A G Total

A 0 50 50

C 50 0 50

Total 50 50 100

e.g.

Page 34: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

To measure LD between alleles at 2 biallelic loci

Locus A Locus B

A1, A2 B1, B2

Given 2N haplotypes:

Haplotype freq for AiBj is

Compare hij to the frequency expected under no association:

Define the disequilibrium coefficient:

n11 n12

n21 n22

2N

B1 B2

A1

A2

Page 35: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

1. D = h11 - pA1pB1 = h22 - pA2pB2

2. Choice of allele labeling may affect

sign but not absolute value of D.n11 n12

n21 n22

2N

B1 B2

A1

A2

Notes:

Page 36: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Common LD measures

Disequilibrium coefficient:

D = h11 - pA1 pB1

Normalized disequilibrium coefficient:

D' = D / |D|max , where

Range of D' is [-1,1]

Correlation coefficient:

r2 = D2 / ( pA1pA2pB1pB2 )

Page 37: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

LD measures

|D'| is 1 when the alleles of the two markers are as

correlated as they can be, given the allele frequencies of

the co-occuring alleles.

The range of r2 depends on the marker allele frequencies.

r2 equals 1 if and only if 1) the MAFs at the two loci match

and 2) the minor alleles always co-occur

D' : useful for identifying regions of reduced recombination.

r2 : useful for identifying markers that are good predictors of

allelic status at other markers.

Page 38: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Using LD in study design

• Reference populations - and their LD/haplotype

patterns - are used to design “tag SNPs”, impute

un-typed variants

• 1000 Genomes Project (1000G)

• Haplotype Reference Consortium (HRC)

• Previously: HapMap: Phase I 2003, Phase II 2007,

Phase III

Page 39: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Using LD in study design

The International HapMap Project, Nature 2003

Page 40: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Using LD in study design

A popular LD tag method:

• "r2 bin tags" (Carlson et al., 2004): greedy algorithm

that identifies bins of SNPs such that at least one

member of each bin has r2 > T (threshold) with all bin

members.

• Note: bin members are not necessarily contiguous

LD patterns inform the design of SNP genotyping

arrays, selection of "tag" SNPs

Page 41: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Thus knowledge of LD patterns is important for disease

gene mapping.

Note: tight linkage between two loci will tend to maintain

linkage disequilibrium.

Page 42: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Decay of linkage disequilibrium

After k generations, disequilibrium decays

according to

where = the recombination fraction

(assuming random mating). (DIFFERENT !)

h11(1)= (1-) h11(0) + pA1pB1 ,

so at generation 1,

D = h11 - pA1pB1 = (1-) (h11(0) - pA1pB1)

Page 43: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

?

?

Ai

Bj

claim: hij(1)= (1-) hij(0) + pAipBj ,

nonrecombinant recombinant

?

Bj

Ai

?

(1-) hij(0) pAipBj

so at generation 1,

D = h11 - pA1pB1 = (1-) (h11(0) - p1q1) = (1-) D0

after k generations, get:

Page 44: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Therefore, after k generations

hijk - piqj = (1 – θ)k (hijo – piqj)

Disequilibrium decays by a factor of (1 – θ)

Note: After 1 generation, genotype frequencies at a single

locus are in equilibrium, haplotype frequencies are not!

Decay of linkage disequilibrium

Page 45: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

c.f. Pak Sham, Statistics in Human Genetics, Chapter 4

Decay of linkage disequilibrium

Page 46: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

How many generations (t) to reduce by ½?

(1 – θ)t (hijo – piqj) = ½ (hijo – piqj)

(1 – θ)t = ½

t log (1 – θ) = log(0.5)

t = log(0.5) / log(1- θ )

e.g. if θ = 0.5 (loci are unlinked) then

t = log(0.5) / log(0.5) = 1

Half-life of linkage disequilibrium

Page 47: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

LD is not a simple monotonic function of physical

distance:

From Taillon-Miller et al., Nat Genet 2000 (O=Xq25, =Xq28)

Page 48: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

LD is not necessarily a monotonic function of distanceDawson et al., Nature 2002 (chromosome 22)

Page 49: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Where does LD come from?

• Potential sources of LD :

1. Linkage between loci

2. Random drift

3. Founder effect

4. Mutation

5. Selection

6. Population admixture / stratification

Page 50: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

• Potential sources of LD :

1. Linkage between loci

2. Random drift

3. Founder effect

4. Mutation

5. Selection

6. Population admixture / stratification

Page 51: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Genetic drift generates LD (|D| > 0)

• Via random changes in gamete frequencies

• Smaller isolates: slower decay of LD

A. Templeton, Human Population Genetics and Genomics, Chapter 4

Page 52: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

• Potential sources of LD :

1. Linkage between loci

2. Random drift

3. Founder effect

4. Mutation

5. Selection

6. Population admixture / stratification

Page 53: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

Suppose have loci A, B, C in that order.

Due to founder effect, suppose sample only 4 haplotypes out of the 8 possible:

A B C

1 1 1

1 2 1

2 1 2

2 2 2

Note:

A and B are in equilibrium

B and C are in equilibrium

A and C are in complete disequilibrium

Disequilibrium not necessarily related to distance!

Page 54: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

• Potential sources of LD :

1. Linkage between loci

2. Random drift

3. Founder effect

4. Mutation

5. Selection

6. Population admixture / stratification

At the appearance of the mutation, that allele occurs

only on one haplotype background

Page 55: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Linkage Disequilibrium

• Potential sources of LD :

1. Linkage between loci

2. Random drift

3. Founder effect

4. Mutation

5. Selection

6. Population admixture / stratification

Page 56: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

An example of spurious association due to admixture/stratification:

population 1 population 2

9 1 10

81 9 90

90 10 100

25 25 50

25 25 50

50 50 100

chi-square = 0 chi-square = 0

34 26 60

106 34 140

140 60 200

combined

chi-square = 7.26

p-value = 0.007

Page 57: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

Describing empirical LD patterns

Haploview output

Dick et al., 2007

Page 58: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring

A first generation linkage disequilibrium map of chromosome 22Dawson et al., Nature 2002

1504 SNPs analyzed in 2 distinct samples

Nature 2007

Nature 2003: HapMap I (genome-wide)

Page 59: Population Genetics, Lecture 2genetics.wustl.edu/bio5488/files/2019/03/genomics...2019/03/20  · Population Genetics, Lecture 2 Nancy Lim Saccone nlims@wustl.edu Bio 5488, Spring