27
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012

Lecture 22: Signatures of Selection and Introduction to

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 22: Signatures of Selection and Introduction to

Lecture 22: Signatures of Selection and Introduction to

Linkage Disequilibrium

November 12, 2012

Page 2: Lecture 22: Signatures of Selection and Introduction to

Last Time  Sequence data and quantification of

variation   Infinite sites model  Nucleotide diversity (π)

 Sequence-based tests of neutrality  Tajima’s D  Hudson-Kreitman-Aguade  Synonymous versus Nonsynonymous substitutions  McDonald-Kreitman

Page 3: Lecture 22: Signatures of Selection and Introduction to

Today

 Signatures of selection based on synonymous and nonsynonymous substitutions

 Multiple loci and independent segregation

 Estimating linkage disequilibrium

Page 4: Lecture 22: Signatures of Selection and Introduction to

Using Synonymous Substitutions to Control for Factors Other Than

Selection

dN/dS or Ka/Ks Ratios

Page 5: Lecture 22: Signatures of Selection and Introduction to

Types of Mutations (Polymorphisms)

Page 6: Lecture 22: Signatures of Selection and Introduction to

  First and second position SNP often changes amino acid

  UCA, UCU, UCG, and UCC all code for Serine

 Third position SNP often synonymous

 Majority of positions are nonsynonymous

 Not all amino acid changes affect fitness: allozymes

Synonymous versus Nonsynonymous SNP

Page 7: Lecture 22: Signatures of Selection and Introduction to

Synonymous & Nonsynonymous Substitutions  Synonymous substitution rate can be used to

set neutral expectation for nonsynonymous rate

 dS is the relative rate of synonymous mutations per synonymous site

 dN is the relative rate of nonsynonymous mutations per non-synonymous site

 ω = dN/dS

 If ω = 1, neutral selection  If ω < 1, purifying selection  If ω > 1, positive Darwinian selection

 For human genes, ω ≈ 0.1

Page 8: Lecture 22: Signatures of Selection and Introduction to

Complications in Estimating dN/dS

 Multiple mutations in a codon give multiple possible paths

 Two types of nucleotide base substitutions resulting in SNPs: transitions and transversions not equally likely

 Back-mutations are invisible

 Complex evolutionary models using likelihood and Bayesian approaches must be used to estimate dN/dS (also called KA/KS or KN/KS depending on method) (PAML package)

http://www.mun.ca/biology/scarr/Transitions_vs_Transversions.html

CGT(Arg)->AGA(Arg)

CGT(Arg)->AGT(Ser)->AGA(Arg)

CGT(Arg)->CGA(Arg)->AGA(Arg)

Page 9: Lecture 22: Signatures of Selection and Introduction to

dn/ds ratios for 363 mouse-rat comparisons

interleukin-3: mast cells and bone marrow cells in immune system

Hartl and Clark 2007

 Most genes show purifying selection (dN/dS < 1)

  Some evidence of positive selection, especially in genes related to immune system

Page 10: Lecture 22: Signatures of Selection and Introduction to

McDonald-Kreitman Test  Conceptually similar to HKA test

 Uses only one gene

 Contrasts ratios of synonymous divergence and polymorphism to rates of nonsynonymous divergence and polymorphism

 Gene provides internal control for evolution rates and demography

Page 11: Lecture 22: Signatures of Selection and Introduction to

  Aligned 11,624 gene sequences between human and chimp

  Calculated synonymous and nonsynonymous substitutions between species (Divergence) and within humans (SNPs)

  Identified 304 genes showing evidence of positive selection (blue) and 814 genes showing purifying selection (red) in humans

Bustamente et al. 2005. Nature 437, 1153-1157

  Positive selection: defense/immunity, apoptosis, sensory perception, and transcription factors

  Purifying selection: structural and housekeeping genes

Application of McDonald-Kreitman Test:

Page 12: Lecture 22: Signatures of Selection and Introduction to

Genes showing purifying (red) or positive (blue) selection in the human genome based on the McDonald-Kreitman Test

Bustamente et al. 2005. Nature 437, 1153-1157

Page 13: Lecture 22: Signatures of Selection and Introduction to

How can you differentiate between effects of selection and demographic

effects on sequence variation?

Will this work for organellar DNA?

Page 14: Lecture 22: Signatures of Selection and Introduction to

Extending to Multiple Loci  So far, only considering dynamics of alleles at single loci

  Loci occur on chromosomes, linked to other loci! “The fitness of a single locus ripped from its interactive context is about as

relevant to real problems of evolutionary genetics as the study of the psychology of individuals isolated from their social context is to an understanding of man’s sociopolitical evolution” Richard Lewontin (quoted in Hedrick 2005)

 Size of region that must be considered depends on Linkage Disequilibrium

Page 15: Lecture 22: Signatures of Selection and Introduction to

Gametic (Linkage) Disequilibrium (LD)  Nonrandom association of alleles at different loci

into gametes

 Haplotype: Genotype of a group of closely linked loci

 LD is a major factor in evolution

 LD itself provides insights into population history

 Estimation of LD is critical for ALL population genetic data

Page 16: Lecture 22: Signatures of Selection and Introduction to

Nomenclature and concepts  Two loci, two alleles

  Frequency of allele i at locus 1 is pi

  Frequency of allele i at locus 2 is qi

A1

A2

B1

B2

p1

p2

q1

q2

111

==∑∑==

n

ii

n

ii qp

Page 17: Lecture 22: Signatures of Selection and Introduction to

Nomenclature and concepts

 Genotype is written as

A1

A2

B1

B2

A1 A2 B1 B2

 A1 and B1 are in coupling phase

 A1 and B2 are in repulsion phase

Page 18: Lecture 22: Signatures of Selection and Introduction to

Gametic Disequilibrium   Easiest to think about physically linked loci, but

not necessarily the case

A1 A2 B1 B2

A1 B1 A1 B2 A2 B1 A2 B2

Meiosis

p1q1 p1q2 p2q1 p2q2 What Are Expected Frequencies of Gametes in a

Population Under Independent Assortment?

Page 19: Lecture 22: Signatures of Selection and Introduction to

What are expected frequency of Gametes with complete linkage?

A1

A2

B1

B2

p1

p2

q1

q2

A1 A2 B1 B2

A1 B1 A1 B2 A2 B1 A2 B2

Meiosis

x11 x12 x21 x22

Page 20: Lecture 22: Signatures of Selection and Introduction to

Linkage disequilibrium measure, D

Independent Assortment: With LD:

Substituting from above table:

21122211 xxxxD −=

Page 21: Lecture 22: Signatures of Selection and Introduction to

Problem: D is sensitive to allele frequencies

Example, if D is positive: p1=0.5, q2=0.5,

Dmax=0.25 but p1=0.1, q2=0.9,

Dmax=0.09

Solution: D' = D/Dmax ranges from -1 to 1

Dmax Calculation:

If D is positive, Dmax is lesser of p1q2 or p2q1

If D is negative, Dmax is lesser of

p1q1 or p2q2

  Can’t have negative gamete frequencies

 Maximum D set by allele frequencies

Page 22: Lecture 22: Signatures of Selection and Introduction to

LD can also be estimated as correlation between alleles

  r can also be standardized to a -1 to 1 scale

  It is equivalent to D’ in this case

2121

2

qqppDr =

''

2121

max

2121 D

qqppD

qqppD

r ==

Page 23: Lecture 22: Signatures of Selection and Introduction to

Recombination

 Shuffling of parental alleles during meiosis

A1 A2 B1 B2

 Occurs for unlinked loci and linked loci

 Rate of recombination for linked markers is partially a function of physical distance

A1

A2

B1

B2

A1

A2 B1

B2

Page 24: Lecture 22: Signatures of Selection and Introduction to

What is the expected recombination rate for unlinked loci?

A1 A2 B1 B2

A1 B1 A1 B2 A2 B1 A2 B2

Meiosis

cr

r

nnnc+

= Where nr is number of repulsion phase gametes, and nc is number of coupling phase gametes

Coupling Coupling Repulsion Repulsion

Page 25: Lecture 22: Signatures of Selection and Introduction to

LD is partially a function of recombination rate

  Expected proportions of gametes produced by various genotypes over two generations

Where c is the recombination rate and D0 is the initial amount of LD

First generation (Second generation)

Page 26: Lecture 22: Signatures of Selection and Introduction to

Recombination degrades LD over time

211222111 '''' xxxxD −=

))(())(( 021012022011 cDxcDxcDxcDx −−−−−=

01 )1( DcD −=

0)1( DcD tt −= 0DeD ct

t−=

Where t is time (in generations) and e is base of natural log (2.718)

Page 27: Lecture 22: Signatures of Selection and Introduction to

Effects of recombination rate on LD

 Decline in LD over time with different theoretical recombination rates (c)

  Even with independent segregation (c=0.5), multiple generations required to break up allelic associations

 Genome-wide linkage disequilibrium can be caused by demographic factors (more later)