Introduction to QTL analysis Peter Visscher University of Edinburgh peter.visscher@ed.ac.uk

Preview:

Citation preview

Introduction to QTL analysis

Peter Visscher

University of Edinburgh

peter.visscher@ed.ac.uk

Overview

• Principles of QTL mapping

• QTL mapping using sibpairs

• IBD estimation from marker data

• Improving power– ML variance components– Selective genotyping– Large(r) pedigrees

t

m-a m+d m+a

QQ

Qq

qq

Trait

m-a m+d m+a

QQ

Qq

qq

Trait

[Fisher, Wright]

Quantitative Trait Locus = a segment of DNA that affects a quantitative trait

Mapping QTL

• Determining the position of a locus causing variation in the genome.

• Estimating the effect of the alleles and mode of action.

Why map QTL ?

• To provide knowledge towards a fundamental understanding of individual gene actions and interactions

• To enable positional cloning of the gene • To improve breeding value estimation and

selection response through marker assisted selection (plants, animals)

Science; Medicine; Agriculture

Principles of QTL mapping

• Co-segregation of QTL alleles and linked marker alleles in pedigrees

Unobserved QTL alleles

q m

Q M

Observed marker alleles

pair ofchromosomes

Linkage = Co-Linkage = Co-segregationsegregation

A2A4

A3A4

A1A3

A1A2

A2A3

A1A2 A1A4 A3A4 A3A2

Marker allele A1

cosegregates withdominant disease

RecombinationRecombinationA1

A2

Q1

Q2

A1

A2

Q1

Q2

A1

A2 Q1

Q2

Likely gametes(Non-recombinants)

Unlikely gametes(Recombinants)

Parental genotypes

“Linkage analysis = counting recombinants"

Map distanceMap distance

Map distance between two loci (Morgans)

= Expected number of crossovers per meiosis

Note: Map distances are additive. Recombination frequencies are not.

1 Morgan = 100 cM; 1 cM ~ 1 Mb

Recombination & map Recombination & map distancedistance

2

1 2me

Haldane (1919)Map Function

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 0.2 0.4 0.6 0.8 1

Map distance (M)

Re

co

mb

ina

tio

n f

rac

tio

n

Principles of QTL mapping

• Co-segregation of phenotypes and genotypes in pedigrees– Genetic markers give information on IBD sharing

between relatives [genotypes]– Association between phenotypes and genotypes

gives information on QTL location and effect [linkage]

• Need informative mapping population

Population Features Example Species Inbred lines Backcross (BC) Simplest design; powerful if

dominance in ‘right’ direction mice, plants

F2 Estimation of additive and dominance effects; more powerful than BC for additive effects

mice, rats

Advanced intercross line (AIL)

As for F2 but with increased resolution of map location

mice

Recombinant inbred lines (RIL)

F1 followed by inbreeding; homozygous comparisons only; powerful for additive effects; less environmental noise

mice, plants

Congenic lines (= Nearly isogenic lines)

Backcrossing followed by inbreeding; homozygous comparisons only after inbreeding. Lines contain ~1% of donor genome

mice, rats, plants

Double haploid lines (DHL)

Instant homozygosity through doubling of F1 gametes; homozygous comparisons only; powerful for additive effects and QTLxE interactions

plants

F2:3 Inbred progeny of F2; increased precision through progeny means

plants

Structured outbred populations

BC / F2 / AIL As for inbred lines; mapping variation between lines

livestock, outbreeding trees/plants

Large fullsib families Estimating contrasts between parental alleles. Allows for dominance estimation.

trees, fish, poultry

Halfsib families Estimating contrasts between common parent alleles

cattle, pigs, poultry, trees

Nuclear families, including sibpairs

Detection of variance explained by markers

humans, livestock

Unstructured outbred populations

Complex pedigrees Detection of variance explained by markers

humans, livestock

Mapping populations

Informative pig pedigree

X

©Roslin Institute

QQ qq

QQ Qq qq

q q

q q

qqq

q q

X

X X

Large White Large WhiteMeishan

Meishan

F1

F2

2

2

1

2

1

2

2

2

1

1

1

1

1

1

1

1

1

2

1

2

2

2

2

2

2

2 q1

2

Line cross

• Only two QTL alleles segregating

• QTL effect can be estimated as the mean difference between genotype groups

• Power depends on sample size & effect of QTL

• Ascertain divergent lines

• Resolution of QTL map is low: ~10-40 Mb

Figure 1.2: Sample size for QTL detection in genome scans

0

500

1000

1500

2000

2500

3000

3500

4000

0 0.05 0.1 0.15 0.2 0.25

Proportion of variance due to QTL

Sam

ple

siz

e

Additive QTL

Dominant QTL

=0.0001, power = 90%, F2 population

Outbred populations: Complications

Markers not fully informative (segregating in the parental generation)

QTL not segregating in all families (All F1 segregate in inbred line cross)

Association between marker and QTL at the family rather than population level (i.e. linkage phase differs between families)

Additional variance between families due to other loci

Line cross vs. outbred population

Cross Outbred

# QTL alleles 2 2

# Generations 3 2

Required sample size 100s 1000s

QTL Estimation Mean Variance

QTL as a random effect

yi = + Qi + Ai + Ei

Qi = QTL genotype contribution for chrom. segment

Ai = Contribution from rest of genome

var(y) = q2 + a

2 + e2

Logical extension of linear models used during the course

• This week: partitioning (co)variances into (causal) components

• QTL mapping: partitioning genetic variance into underlying components– Linkage analysis: dissecting within-family

genetic variation

Genetic covariance between relatives

cov(yi,yj) = ij q2

+ aij a2

aij = average prop. of alleles shared in the genome (kinship matrix)

ij = proportion of alleles IBD at QTL

(0, ½ or 1)

E(ij) = aij

ij = Pr(2 alleles IBD) + ½Pr(1 allele IBD)

= proportion of alleles IBD in non-inbred pedigree

Estimate ij with genetic markers

Fully informative marker

• Determine IBD sharing between sibpairs unambiguously

• Example: Dad = 1/2 Mum= 3/4– Transmitted allele from Dad is either 1 or 2– Transmitted allele from Mum is either 3 or 4

Sibpairs & fully informative marker

# Alleles IBD Pr.

0 0 ¼

1 ½ ½

2 1 ¼

E() = Pr() = ½

E(2) = 2Pr() = 3/8

var() = E(2) – E()2 = 1/8

CV = 0.52 = 70%

Haseman-Elston (1972)

“The more alleles pairs of relatives share at a QTL, the greater their phenotypic similarity”

or

“The more alleles they share IBD, the smaller the difference in their phenotype”

Population sib-pair trait distribution

No linkage

Under linkage

Sib pair (or DZ twins) design to map QTL

• Multiple ‘families’ of two (or more) sibs

• Phenotypes on sibs

• Marker genotypes on sibs (& parents)

• Correlate phenotypes and genotypes of sibs

Data structure is simple

Pair Phenotypes Prop. alleles IBD

1 y11 y12 1

2 y21 y22 2

.....

n yn1 yn2 n

= 0, ½ or 1 for fully informative markers

Notation

Y

D = (y1 – y2)

D2 = (y1 – y2)2

S = [(y1 – ) + (y2 – )]

S2 = [(y1 – ) + (y2 – )]2

CP = (y1 – )(y2 – )

Proposed analysis…...Data Method Reference

y1 & y2 ML ‘LOD’ Parametric linkage analysis

D2 Regression Haseman & Elston (1972)

D2 & S2 Regression Drigalenko (1998)

Xu et al. (2000); Sham & Purcell (2001); Forrest (2001)

CP Regression Elston et al. (2000)

y1 & y2 ML VC Goldgar (1990); Schork (1993)

D ML Kruglyak & Lander (1995)

D & S ML VC Fulker & Cherny (1996); Wright (1997)

Properties of squared differences

E(Y1 – Y2)2 = var(Y1 – Y2) + (E(Y1 – Y2))2

var(Y1 – Y2) = var(Y1) + var(Y2) -2cov(Y1,Y2)

If E(Yi) = 0 and var(Y1)=var(Y2), then

E(Y1 – Y2)2 = 2(1-r)var(Y)

Haseman-Elston method• Phenotype on relative pair j:

Yj= (y1j - y2j)2

E(Yi) = E[(Q1j - Q2j + A1j - A2j + (e1j - e2j)2]

= E[(Q1j - Q2j)2] + {2(1-aij)a2 + 2e

2}

= 2[q2 - cov(Q1j,Q2j)] + {

2}

= (2q2 +

2) - 2jt q2

jt= proportion of alleles IBD at QTL (trait, t) for relative pair j

Conditional expectation

E(Yj| jt) = (2q2 +

2) - jt

2q2

• negative slope of Y on if q2 > 0

• estimate jt from marker data (jm)

• use simple linear regression to detect QTL:

E(Yj| jm) = + jm

Haseman-Elston regression

y = -1.3577x + 3.1252

R2 = 0.0173

0

5

10

15

20

25

30

35

40

0 0.5 1

IBD

Sq

ua

red

dif

fere

nc

e

A significant negative slope indicates linkage to a QTL

Single fully informative marker

= -2(1 - 2r)2 q2

(1 - 2r)2 q2 term is analogous to variance explained by a single marker in a backcross/F2

design

= 2[1 - 2(1-r)r] q2 +

2 r = recombination fraction between marker &

QTLStatistical test: = 0 versus < 0• Disadvantage of method

– not powerful– confounding between QTL location and effect

Interval mapping for sibpair analysis(Fulker & Cardon, 1994)

• Estimate jt from IBD status at flanking markers

• Allows genome screen, separating effect & location– regression with largest R2 indicates map

position of QTL

Example from Cardon et al. (1994)

[Lynch & Walsh, page 520]

Calculating jt|jm

For jt midway between two flanking markers:

jt ~ r2/c + ½[(1 - 2r)/c]jm1 + ½[(1 - 2r)/c]jm2

c = 1 - 2r + 2r2

r = recombination fraction between markers

jmk = jm at flanking marker k

Assumption: flanking markers are fully informative

Examplesr c jt

0.5 0.5 0.5

0.2 17/25 (2/34) + (15/34)jm1 + (15/34)jm2

[if jm1 and jm2 are 1, jt = 32/34 < 1]

Exercise

Calculate jt for a location midway between two markers that are 30 cM apart, when the proportion of alleles shared at the flanking markers are 1.0 and 0.5. Use the Haldane mapping function to calculate the recombination rate between the markers.

jm1 = 1, jm2 = 0.5

Extensions to Haseman-Elston method

• Interval mapping• Alternative models

– QTL with dominance

• Other methods to estimate jt

– Using all markers on a chromosome (Merlin)

– Monte Carlo sampling methods

– Using both markers info & phenotypic info

• Add linkage information from:– Zj = [(y1j - ) + (y2j - )]2

Sample size for QTL detection in genome scans using sibpairs

0

5000

10000

15000

20000

25000

30000

35000

40000

0 0.1 0.2 0.3 0.4 0.5

Proportion of variance due to QTL

Nu

mb

er o

f p

airs

Sib-correlation = 0

Sib-correlation = 0.5

Power = 90%. Type-I error = 10-5

Estimating when marker is not fully informative

• Using:– Mendelian segregation rules– Marker allele frequencies in the population

IBD can be trivial…

1

1 1

1

/ 2 2/

2/ 2/

IBD=0

Two Other Simple Cases…

1

1 1

1

/

2/ 2/

1 1/

1 12/ 2/

IBD=2

2 2/ 2 2/

A little more complicated…

1 2/

IBD=1(50% chance)

2 2/

1 2/ 1 2/

IBD=2(50% chance)

And even more complicated…

1 1/IBD=? 1 1/

Bayes Theorem for IBD Probabilities

j

jIBDGPjIBDP

iIBDGPiIBDP

GP

iIBDGPiIBDP

GP

GiGiIBDP

)|()(

)|()(

)(

)|()(

)(

), P(IBD)|(

prior

Prob(data)

posterior

P(Marker Genotype|IBD State)

[Assumes Hardy-Weinberg proportions of genotypes in the population]

Worked Example

1 1/ 1 1/ 94

)(4

1)|2(

94

)(2

1)|1(

91

)(4

1)|0(

649

41

21

41)(

41)2|(

81)1|(

161)0|(

5.0

21

3

41

21

31

41

21

31

41

1

1

GP

pGIBDP

GP

pGIBDP

GP

pGIBDP

pppGP

pIBDGP

pIBDGP

pIBDGP

p

Exercise

1 2/ 1 2/

)|2(

)|1(

)|0(

...41...2

1...41)(

)2|(

)1|(

)0|(

10.0,5.0 21

GIBDP

GIBDP

GIBDP

GP

IBDGP

IBDGP

IBDGP

pp

Using multiple markers

• Mendelian segregation rules

• Marker allele frequencies in the population

• Linkage between markers

• Efficient multi-marker (multi-point) algorithms available (e.g., Merlin, Genehunter)

Software for QTL analysis of sibpairs

• Mx• Merlin• Genehunter• S.A.G.E. ($)• QTL Express (regression)• Solar (complex pedigrees)• Lots of others…

http://www.nslij-genetics.org/soft/

George Seaton, Sara Knott, Chris Haley, Peter Visscher

Roslin Institute University of Edinburgh

http://QTL.cap.ed.ac.uk/

QTL Express: User-friendly web-based software to map QTL

in outbred populations

Conclusions (sibpairs)

• Power of sib pair design is low– more relative pairs needed

• more contrasts e.g. extended pedigrees

• selective genotyping– extreme phenotypes are most informative for linkage

– more powerful analysis methods• ML variance component analysis

Maximum likelihood for sibpairs(assuming bivariate normality |

& fully informative marker)Full model:

-2ln(L) = nln|V| + (y-)V-1(y-)

V = f2 + q2 + r2 f2 + q2

f2 + q2 f2 + q2 + r2

Maximum likelihood

Reduced model:

-2ln(L) = nln|V| + (y-)V-1(y-)

V = f2 + r2 f2

f2 f2 + r2

Test statistic

LRT = 2ln(MLfull) - 2ln(MLreduced)

H0(q2=0): LRT ~ ½2(1) + ½(0)

[Fisher et al. 1999]

Example: QTL analysis for dyslexia on chromosome 6p using sib-pairs

Phenotype: Irregular word test181 sib-pairs

~15 Mb

or distribution approachin analysis?

012

012

ˆ*0ˆ½ˆˆ

ˆˆˆ1

IBDIBDIBD

IBDIBDIBD

Expectation approach: use

Distribution approach: use IBD probabilities and mixture distribution

Selective genotyping & sibpairs

• Concordant pairs– both sibs in upper or lower tail of the

phenotypic distribution

• Discordant pairs– one sib in upper tail, other in lower tail

• Powerful design– requires many (cheap) phenotypes

Anxiety QTLs

[Fullerton et al. 2003]

Selection from ~30,000 sibpairs

Results

[Fullerton et al. 2003]

~5 QTLs detected

Variance component analysis in complex pedigrees

• Partition observed variation in quantitative traits into causal components, e.g.,– Polygenic– Common environment (‘household’)– QTL– Residual, including measurement error

• IBD proportions () estimated from multiple markers

“ACEQ” model

[Blackwood et al. 1996] Bipolar pedigree

0

1

2

3

4

0 5 10 15 20 25

cM

test

sta

tist

ic (

LO

D s

core

)

Two-point linkage analysis

Variance component analysis

Blackwood et al. (1996) data

Example: QTL analysis for BMI using a complex pedigree

[Deng et al. 2002]

Recommended