Introduction to QTL analysis Peter Visscher University of Edinburgh peter.visscher@ed.ac.uk

Introduction to QTL analysis

Peter Visscher

University of Edinburgh

peter.visscher@ed.ac.uk

Overview

• Principles of QTL mapping

• QTL mapping using sibpairs

• IBD estimation from marker data

• Improving power– ML variance components– Selective genotyping– Large(r) pedigrees

m-a m+d m+a

[Fisher, Wright]

Quantitative Trait Locus = a segment of DNA that affects a quantitative trait

Mapping QTL

• Determining the position of a locus causing variation in the genome.

• Estimating the effect of the alleles and mode of action.

Why map QTL ?

• To provide knowledge towards a fundamental understanding of individual gene actions and interactions

• To enable positional cloning of the gene • To improve breeding value estimation and

selection response through marker assisted selection (plants, animals)

Science; Medicine; Agriculture

Principles of QTL mapping

• Co-segregation of QTL alleles and linked marker alleles in pedigrees

Unobserved QTL alleles

Observed marker alleles

pair ofchromosomes

Linkage = Co-Linkage = Co-segregationsegregation

A1A2 A1A4 A3A4 A3A2

Marker allele A1

cosegregates withdominant disease

RecombinationRecombinationA1

Likely gametes(Non-recombinants)

Unlikely gametes(Recombinants)

Parental genotypes

“Linkage analysis = counting recombinants"

Map distanceMap distance

Map distance between two loci (Morgans)

= Expected number of crossovers per meiosis

Note: Map distances are additive. Recombination frequencies are not.

1 Morgan = 100 cM; 1 cM ~ 1 Mb

Recombination & map Recombination & map distancedistance

Haldane (1919)Map Function

0 0.2 0.4 0.6 0.8 1

Map distance (M)

Principles of QTL mapping

• Co-segregation of phenotypes and genotypes in pedigrees– Genetic markers give information on IBD sharing

between relatives [genotypes]– Association between phenotypes and genotypes

gives information on QTL location and effect [linkage]

• Need informative mapping population

Population Features Example Species Inbred lines Backcross (BC) Simplest design; powerful if

dominance in ‘right’ direction mice, plants

F2 Estimation of additive and dominance effects; more powerful than BC for additive effects

mice, rats

Advanced intercross line (AIL)

As for F2 but with increased resolution of map location

Recombinant inbred lines (RIL)

F1 followed by inbreeding; homozygous comparisons only; powerful for additive effects; less environmental noise

mice, plants

Congenic lines (= Nearly isogenic lines)

Backcrossing followed by inbreeding; homozygous comparisons only after inbreeding. Lines contain ~1% of donor genome

mice, rats, plants

Double haploid lines (DHL)

Instant homozygosity through doubling of F1 gametes; homozygous comparisons only; powerful for additive effects and QTLxE interactions

plants

F2:3 Inbred progeny of F2; increased precision through progeny means

plants

Structured outbred populations

BC / F2 / AIL As for inbred lines; mapping variation between lines

livestock, outbreeding trees/plants

Large fullsib families Estimating contrasts between parental alleles. Allows for dominance estimation.

trees, fish, poultry

Halfsib families Estimating contrasts between common parent alleles

cattle, pigs, poultry, trees

Nuclear families, including sibpairs

Detection of variance explained by markers

humans, livestock

Unstructured outbred populations

Complex pedigrees Detection of variance explained by markers

humans, livestock

Mapping populations

Informative pig pedigree

©Roslin Institute

QQ Qq qq

Large White Large WhiteMeishan

Meishan

Line cross

• Only two QTL alleles segregating

• QTL effect can be estimated as the mean difference between genotype groups

• Power depends on sample size & effect of QTL

• Ascertain divergent lines

• Resolution of QTL map is low: ~10-40 Mb

Figure 1.2: Sample size for QTL detection in genome scans

0 0.05 0.1 0.15 0.2 0.25

Proportion of variance due to QTL

Additive QTL

Dominant QTL

=0.0001, power = 90%, F2 population

Outbred populations: Complications

Markers not fully informative (segregating in the parental generation)

QTL not segregating in all families (All F1 segregate in inbred line cross)

Association between marker and QTL at the family rather than population level (i.e. linkage phase differs between families)

Additional variance between families due to other loci

Line cross vs. outbred population

Cross Outbred

# QTL alleles 2 2

# Generations 3 2

Required sample size 100s 1000s

QTL Estimation Mean Variance

QTL as a random effect

yi = + Qi + Ai + Ei

Qi = QTL genotype contribution for chrom. segment

Ai = Contribution from rest of genome

var(y) = q2 + a

2 + e2

Logical extension of linear models used during the course

• This week: partitioning (co)variances into (causal) components

• QTL mapping: partitioning genetic variance into underlying components– Linkage analysis: dissecting within-family

genetic variation

Genetic covariance between relatives

cov(yi,yj) = ij q2

+ aij a2

aij = average prop. of alleles shared in the genome (kinship matrix)

ij = proportion of alleles IBD at QTL

(0, ½ or 1)

E(ij) = aij

ij = Pr(2 alleles IBD) + ½Pr(1 allele IBD)

= proportion of alleles IBD in non-inbred pedigree

Estimate ij with genetic markers

Fully informative marker

• Determine IBD sharing between sibpairs unambiguously

• Example: Dad = 1/2 Mum= 3/4– Transmitted allele from Dad is either 1 or 2– Transmitted allele from Mum is either 3 or 4

Sibpairs & fully informative marker

# Alleles IBD Pr.

0 0 ¼

1 ½ ½

2 1 ¼

E() = Pr() = ½

E(2) = 2Pr() = 3/8

var() = E(2) – E()2 = 1/8

CV = 0.52 = 70%

Haseman-Elston (1972)

“The more alleles pairs of relatives share at a QTL, the greater their phenotypic similarity”

“The more alleles they share IBD, the smaller the difference in their phenotype”

Population sib-pair trait distribution

No linkage

Under linkage

Sib pair (or DZ twins) design to map QTL

• Multiple ‘families’ of two (or more) sibs

• Phenotypes on sibs

• Marker genotypes on sibs (& parents)

• Correlate phenotypes and genotypes of sibs

Data structure is simple

Pair Phenotypes Prop. alleles IBD

1 y11 y12 1

2 y21 y22 2

n yn1 yn2 n

= 0, ½ or 1 for fully informative markers

Notation

D = (y1 – y2)

D2 = (y1 – y2)2

S = [(y1 – ) + (y2 – )]

S2 = [(y1 – ) + (y2 – )]2

CP = (y1 – )(y2 – )

Proposed analysis…...Data Method Reference

y1 & y2 ML ‘LOD’ Parametric linkage analysis

D2 Regression Haseman & Elston (1972)

D2 & S2 Regression Drigalenko (1998)

Xu et al. (2000); Sham & Purcell (2001); Forrest (2001)

CP Regression Elston et al. (2000)

y1 & y2 ML VC Goldgar (1990); Schork (1993)

D ML Kruglyak & Lander (1995)

D & S ML VC Fulker & Cherny (1996); Wright (1997)

Properties of squared differences

E(Y1 – Y2)2 = var(Y1 – Y2) + (E(Y1 – Y2))2

var(Y1 – Y2) = var(Y1) + var(Y2) -2cov(Y1,Y2)

If E(Yi) = 0 and var(Y1)=var(Y2), then

E(Y1 – Y2)2 = 2(1-r)var(Y)

Haseman-Elston method• Phenotype on relative pair j:

Yj= (y1j - y2j)2

E(Yi) = E[(Q1j - Q2j + A1j - A2j + (e1j - e2j)2]

= E[(Q1j - Q2j)2] + {2(1-aij)a2 + 2e

= 2[q2 - cov(Q1j,Q2j)] + {

= (2q2 +

2) - 2jt q2

jt= proportion of alleles IBD at QTL (trait, t) for relative pair j

Conditional expectation

E(Yj| jt) = (2q2 +

2) - jt

• negative slope of Y on if q2 > 0

• estimate jt from marker data (jm)

• use simple linear regression to detect QTL:

E(Yj| jm) = + jm

Haseman-Elston regression

y = -1.3577x + 3.1252

R2 = 0.0173

0 0.5 1

A significant negative slope indicates linkage to a QTL

Single fully informative marker

= -2(1 - 2r)2 q2

(1 - 2r)2 q2 term is analogous to variance explained by a single marker in a backcross/F2

design

= 2[1 - 2(1-r)r] q2 +

2 r = recombination fraction between marker &

QTLStatistical test: = 0 versus < 0• Disadvantage of method

– not powerful– confounding between QTL location and effect

Interval mapping for sibpair analysis(Fulker & Cardon, 1994)

• Estimate jt from IBD status at flanking markers

• Allows genome screen, separating effect & location– regression with largest R2 indicates map

position of QTL

Example from Cardon et al. (1994)

[Lynch & Walsh, page 520]

Calculating jt|jm

For jt midway between two flanking markers:

jt ~ r2/c + ½[(1 - 2r)/c]jm1 + ½[(1 - 2r)/c]jm2

c = 1 - 2r + 2r2

r = recombination fraction between markers

jmk = jm at flanking marker k

Assumption: flanking markers are fully informative

Examplesr c jt

0.5 0.5 0.5

0.2 17/25 (2/34) + (15/34)jm1 + (15/34)jm2

[if jm1 and jm2 are 1, jt = 32/34 < 1]

Exercise

Calculate jt for a location midway between two markers that are 30 cM apart, when the proportion of alleles shared at the flanking markers are 1.0 and 0.5. Use the Haldane mapping function to calculate the recombination rate between the markers.

jm1 = 1, jm2 = 0.5

Extensions to Haseman-Elston method

• Interval mapping• Alternative models

– QTL with dominance

• Other methods to estimate jt

– Using all markers on a chromosome (Merlin)

– Monte Carlo sampling methods

– Using both markers info & phenotypic info

• Add linkage information from:– Zj = [(y1j - ) + (y2j - )]2

Sample size for QTL detection in genome scans using sibpairs

0 0.1 0.2 0.3 0.4 0.5

Proportion of variance due to QTL

Sib-correlation = 0

Sib-correlation = 0.5

Power = 90%. Type-I error = 10-5

Estimating when marker is not fully informative

• Using:– Mendelian segregation rules– Marker allele frequencies in the population

IBD can be trivial…

/ 2 2/

Two Other Simple Cases…

1 12/ 2/

2 2/ 2 2/

A little more complicated…

IBD=1(50% chance)

1 2/ 1 2/

IBD=2(50% chance)

And even more complicated…

1 1/IBD=? 1 1/

Bayes Theorem for IBD Probabilities

jIBDGPjIBDP

iIBDGPiIBDP

GiGiIBDP

), P(IBD)|(

Prob(data)

posterior

P(Marker Genotype|IBD State)

[Assumes Hardy-Weinberg proportions of genotypes in the population]

Worked Example

1 1/ 1 1/ 94

41)2|(

81)1|(

161)0|(

pGIBDP

pIBDGP

Exercise

1 2/ 1 2/

...41...2

1...41)(

10.0,5.0 21

Using multiple markers

• Mendelian segregation rules

• Marker allele frequencies in the population

• Linkage between markers

• Efficient multi-marker (multi-point) algorithms available (e.g., Merlin, Genehunter)

Software for QTL analysis of sibpairs

• Mx• Merlin• Genehunter• S.A.G.E. ($)• QTL Express (regression)• Solar (complex pedigrees)• Lots of others…

http://www.nslij-genetics.org/soft/

George Seaton, Sara Knott, Chris Haley, Peter Visscher

Roslin Institute University of Edinburgh

http://QTL.cap.ed.ac.uk/

QTL Express: User-friendly web-based software to map QTL

in outbred populations

Conclusions (sibpairs)

• Power of sib pair design is low– more relative pairs needed

• more contrasts e.g. extended pedigrees

• selective genotyping– extreme phenotypes are most informative for linkage

– more powerful analysis methods• ML variance component analysis

Maximum likelihood for sibpairs(assuming bivariate normality |

& fully informative marker)Full model:

-2ln(L) = nln|V| + (y-)V-1(y-)

V = f2 + q2 + r2 f2 + q2

f2 + q2 f2 + q2 + r2

Maximum likelihood

Reduced model:

-2ln(L) = nln|V| + (y-)V-1(y-)

V = f2 + r2 f2

f2 f2 + r2

Test statistic

LRT = 2ln(MLfull) - 2ln(MLreduced)

H0(q2=0): LRT ~ ½2(1) + ½(0)

[Fisher et al. 1999]

Example: QTL analysis for dyslexia on chromosome 6p using sib-pairs

Phenotype: Irregular word test181 sib-pairs

~15 Mb

or distribution approachin analysis?

ˆ*0ˆ½ˆˆ

ˆˆˆ1

IBDIBDIBD

Expectation approach: use

Distribution approach: use IBD probabilities and mixture distribution

Selective genotyping & sibpairs

• Concordant pairs– both sibs in upper or lower tail of the

phenotypic distribution

• Discordant pairs– one sib in upper tail, other in lower tail

• Powerful design– requires many (cheap) phenotypes

Anxiety QTLs

[Fullerton et al. 2003]

Selection from ~30,000 sibpairs

Results

[Fullerton et al. 2003]

~5 QTLs detected

Variance component analysis in complex pedigrees

• Partition observed variation in quantitative traits into causal components, e.g.,– Polygenic– Common environment (‘household’)– QTL– Residual, including measurement error

• IBD proportions () estimated from multiple markers

“ACEQ” model

[Blackwood et al. 1996] Bipolar pedigree

0 5 10 15 20 25

Two-point linkage analysis

Variance component analysis

Blackwood et al. (1996) data

Example: QTL analysis for BMI using a complex pedigree

[Deng et al. 2002]

Introduction to QTL analysis Peter Visscher University of Edinburgh peter.visscher@ed.ac.uk

Documents

Gmail - Total Recall? · Total Recall? Kevin Hay 13 October 2010 15:33 To: Michael Edwards Cc: Martin Parker ,

Perceptions school feedback - UGentmvalcke/CV/Perceptions_school_feedback.pdf · 2007; Van Petegem & Vanhoof, 2007; Visscher, 2002; Visscher & Coe, 2003). This framework is presented

A guide to QTL mapping with R/qtl · simulate QTL mapping data in R/qtl. At the end of the chapter, we describe the internal structure of QTL mapping data within R/qtl; this section

Samenvatting: boek Praktisch Staatsrecht, Y.M. Visscher · 2017. 5. 16. · Samenvatting: boek "Praktisch Staatsrecht", Y.M. Visscher Staatsrecht (Hogeschool Leiden) Verspreiden niet

AIXAM Minauto 4 - Strijker Visscher brommobielen

Marketing CaseStudy GetAbstract Dubinsky Visscher

Uitnodiging Familie Visscher open dag - …...PLUIMVEEBEDRIJF Hoofdsponsors Familie Visscher Hessenweg 91 7722 SX Dalfsen Uitnodiging open dag Familie Visscher Donderdag 4 mei 2017

Mapping Quantitative Trait Loci (QTL) in sheep. III. QTL for carcass

Presentatie prof. dr. Adrie visscher (Onderzoeksconferentie 2015)

QTL analysis / Mutagenesis

Adrianus Canter Visscher manuscript deel 2

Visscher Brochure

Visscher 1

DML13 - Harmen Visscher -Traffic4U - Brand Effect

Akram Group - ed.ac.uk

IAIN MURRAY (i.murray@ed.ac.uk) MATTHEW M. GRAHAM (m.m ...matt-graham.github.io/files/pm_slice_poster.pdf · IAIN MURRAY (i.murray@ed.ac.uk) MATTHEW M. GRAHAM (m.m.graham@ed.ac.uk)

Qtl gramene tutorial

QTL Basic Concept

KNVI2013 Pepijn de Visscher

3.Pierre de Visscher-Dinamica Grupurilor