View
227
Download
1
Category
Preview:
Citation preview
Introduction to QTL analysis
Peter Visscher
University of Edinburgh
peter.visscher@ed.ac.uk
Overview
• Principles of QTL mapping
• QTL mapping using sibpairs
• IBD estimation from marker data
• Improving power– ML variance components– Selective genotyping– Large(r) pedigrees
t
m-a m+d m+a
Trait
m-a m+d m+a
Trait
[Fisher, Wright]
Quantitative Trait Locus = a segment of DNA that affects a quantitative trait
Mapping QTL
• Determining the position of a locus causing variation in the genome.
• Estimating the effect of the alleles and mode of action.
Why map QTL ?
• To provide knowledge towards a fundamental understanding of individual gene actions and interactions
• To enable positional cloning of the gene • To improve breeding value estimation and
selection response through marker assisted selection (plants, animals)
Science; Medicine; Agriculture
Principles of QTL mapping
• Co-segregation of QTL alleles and linked marker alleles in pedigrees
Unobserved QTL alleles
q m
Q M
Observed marker alleles
pair ofchromosomes
Linkage = Co-Linkage = Co-segregationsegregation
A2A4
A3A4
A1A3
A1A2
A2A3
A1A2 A1A4 A3A4 A3A2
Marker allele A1
cosegregates withdominant disease
RecombinationRecombinationA1
A2
Q1
Q2
A1
A2
Q1
Q2
A1
A2 Q1
Q2
Likely gametes(Non-recombinants)
Unlikely gametes(Recombinants)
Parental genotypes
“Linkage analysis = counting recombinants"
Map distanceMap distance
Map distance between two loci (Morgans)
= Expected number of crossovers per meiosis
Note: Map distances are additive. Recombination frequencies are not.
1 Morgan = 100 cM; 1 cM ~ 1 Mb
Recombination & map Recombination & map distancedistance
2
1 2me
Haldane (1919)Map Function
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 0.2 0.4 0.6 0.8 1
Map distance (M)
Re
co
mb
ina
tio
n f
rac
tio
n
Principles of QTL mapping
• Co-segregation of phenotypes and genotypes in pedigrees– Genetic markers give information on IBD sharing
between relatives [genotypes]– Association between phenotypes and genotypes
gives information on QTL location and effect [linkage]
• Need informative mapping population
Population Features Example Species Inbred lines Backcross (BC) Simplest design; powerful if
dominance in ‘right’ direction mice, plants
F2 Estimation of additive and dominance effects; more powerful than BC for additive effects
mice, rats
Advanced intercross line (AIL)
As for F2 but with increased resolution of map location
mice
Recombinant inbred lines (RIL)
F1 followed by inbreeding; homozygous comparisons only; powerful for additive effects; less environmental noise
mice, plants
Congenic lines (= Nearly isogenic lines)
Backcrossing followed by inbreeding; homozygous comparisons only after inbreeding. Lines contain ~1% of donor genome
mice, rats, plants
Double haploid lines (DHL)
Instant homozygosity through doubling of F1 gametes; homozygous comparisons only; powerful for additive effects and QTLxE interactions
plants
F2:3 Inbred progeny of F2; increased precision through progeny means
plants
Structured outbred populations
BC / F2 / AIL As for inbred lines; mapping variation between lines
livestock, outbreeding trees/plants
Large fullsib families Estimating contrasts between parental alleles. Allows for dominance estimation.
trees, fish, poultry
Halfsib families Estimating contrasts between common parent alleles
cattle, pigs, poultry, trees
Nuclear families, including sibpairs
Detection of variance explained by markers
humans, livestock
Unstructured outbred populations
Complex pedigrees Detection of variance explained by markers
humans, livestock
Mapping populations
Informative pig pedigree
X
©Roslin Institute
QQ qq
QQ Qq qq
q q
q q
qqq
q q
X
X X
Large White Large WhiteMeishan
Meishan
F1
F2
2
2
1
2
1
2
2
2
1
1
1
1
1
1
1
1
1
2
1
2
2
2
2
2
2
2 q1
2
Line cross
• Only two QTL alleles segregating
• QTL effect can be estimated as the mean difference between genotype groups
• Power depends on sample size & effect of QTL
• Ascertain divergent lines
• Resolution of QTL map is low: ~10-40 Mb
Figure 1.2: Sample size for QTL detection in genome scans
0
500
1000
1500
2000
2500
3000
3500
4000
0 0.05 0.1 0.15 0.2 0.25
Proportion of variance due to QTL
Sam
ple
siz
e
Additive QTL
Dominant QTL
=0.0001, power = 90%, F2 population
Outbred populations: Complications
Markers not fully informative (segregating in the parental generation)
QTL not segregating in all families (All F1 segregate in inbred line cross)
Association between marker and QTL at the family rather than population level (i.e. linkage phase differs between families)
Additional variance between families due to other loci
Line cross vs. outbred population
Cross Outbred
# QTL alleles 2 2
# Generations 3 2
Required sample size 100s 1000s
QTL Estimation Mean Variance
QTL as a random effect
yi = + Qi + Ai + Ei
Qi = QTL genotype contribution for chrom. segment
Ai = Contribution from rest of genome
var(y) = q2 + a
2 + e2
Logical extension of linear models used during the course
• This week: partitioning (co)variances into (causal) components
• QTL mapping: partitioning genetic variance into underlying components– Linkage analysis: dissecting within-family
genetic variation
Genetic covariance between relatives
cov(yi,yj) = ij q2
+ aij a2
aij = average prop. of alleles shared in the genome (kinship matrix)
ij = proportion of alleles IBD at QTL
(0, ½ or 1)
E(ij) = aij
ij = Pr(2 alleles IBD) + ½Pr(1 allele IBD)
= proportion of alleles IBD in non-inbred pedigree
Estimate ij with genetic markers
Fully informative marker
• Determine IBD sharing between sibpairs unambiguously
• Example: Dad = 1/2 Mum= 3/4– Transmitted allele from Dad is either 1 or 2– Transmitted allele from Mum is either 3 or 4
Sibpairs & fully informative marker
# Alleles IBD Pr.
0 0 ¼
1 ½ ½
2 1 ¼
E() = Pr() = ½
E(2) = 2Pr() = 3/8
var() = E(2) – E()2 = 1/8
CV = 0.52 = 70%
Haseman-Elston (1972)
“The more alleles pairs of relatives share at a QTL, the greater their phenotypic similarity”
or
“The more alleles they share IBD, the smaller the difference in their phenotype”
Population sib-pair trait distribution
No linkage
Under linkage
Sib pair (or DZ twins) design to map QTL
• Multiple ‘families’ of two (or more) sibs
• Phenotypes on sibs
• Marker genotypes on sibs (& parents)
• Correlate phenotypes and genotypes of sibs
Data structure is simple
Pair Phenotypes Prop. alleles IBD
1 y11 y12 1
2 y21 y22 2
.....
n yn1 yn2 n
= 0, ½ or 1 for fully informative markers
Notation
Y
D = (y1 – y2)
D2 = (y1 – y2)2
S = [(y1 – ) + (y2 – )]
S2 = [(y1 – ) + (y2 – )]2
CP = (y1 – )(y2 – )
Proposed analysis…...Data Method Reference
y1 & y2 ML ‘LOD’ Parametric linkage analysis
D2 Regression Haseman & Elston (1972)
D2 & S2 Regression Drigalenko (1998)
Xu et al. (2000); Sham & Purcell (2001); Forrest (2001)
CP Regression Elston et al. (2000)
y1 & y2 ML VC Goldgar (1990); Schork (1993)
D ML Kruglyak & Lander (1995)
D & S ML VC Fulker & Cherny (1996); Wright (1997)
Properties of squared differences
E(Y1 – Y2)2 = var(Y1 – Y2) + (E(Y1 – Y2))2
var(Y1 – Y2) = var(Y1) + var(Y2) -2cov(Y1,Y2)
If E(Yi) = 0 and var(Y1)=var(Y2), then
E(Y1 – Y2)2 = 2(1-r)var(Y)
Haseman-Elston method• Phenotype on relative pair j:
Yj= (y1j - y2j)2
E(Yi) = E[(Q1j - Q2j + A1j - A2j + (e1j - e2j)2]
= E[(Q1j - Q2j)2] + {2(1-aij)a2 + 2e
2}
= 2[q2 - cov(Q1j,Q2j)] + {
2}
= (2q2 +
2) - 2jt q2
jt= proportion of alleles IBD at QTL (trait, t) for relative pair j
Conditional expectation
E(Yj| jt) = (2q2 +
2) - jt
2q2
• negative slope of Y on if q2 > 0
• estimate jt from marker data (jm)
• use simple linear regression to detect QTL:
E(Yj| jm) = + jm
Haseman-Elston regression
y = -1.3577x + 3.1252
R2 = 0.0173
0
5
10
15
20
25
30
35
40
0 0.5 1
IBD
Sq
ua
red
dif
fere
nc
e
A significant negative slope indicates linkage to a QTL
Single fully informative marker
= -2(1 - 2r)2 q2
(1 - 2r)2 q2 term is analogous to variance explained by a single marker in a backcross/F2
design
= 2[1 - 2(1-r)r] q2 +
2 r = recombination fraction between marker &
QTLStatistical test: = 0 versus < 0• Disadvantage of method
– not powerful– confounding between QTL location and effect
Interval mapping for sibpair analysis(Fulker & Cardon, 1994)
• Estimate jt from IBD status at flanking markers
• Allows genome screen, separating effect & location– regression with largest R2 indicates map
position of QTL
Example from Cardon et al. (1994)
[Lynch & Walsh, page 520]
Calculating jt|jm
For jt midway between two flanking markers:
jt ~ r2/c + ½[(1 - 2r)/c]jm1 + ½[(1 - 2r)/c]jm2
c = 1 - 2r + 2r2
r = recombination fraction between markers
jmk = jm at flanking marker k
Assumption: flanking markers are fully informative
Examplesr c jt
0.5 0.5 0.5
0.2 17/25 (2/34) + (15/34)jm1 + (15/34)jm2
[if jm1 and jm2 are 1, jt = 32/34 < 1]
Exercise
Calculate jt for a location midway between two markers that are 30 cM apart, when the proportion of alleles shared at the flanking markers are 1.0 and 0.5. Use the Haldane mapping function to calculate the recombination rate between the markers.
jm1 = 1, jm2 = 0.5
Extensions to Haseman-Elston method
• Interval mapping• Alternative models
– QTL with dominance
• Other methods to estimate jt
– Using all markers on a chromosome (Merlin)
– Monte Carlo sampling methods
– Using both markers info & phenotypic info
• Add linkage information from:– Zj = [(y1j - ) + (y2j - )]2
Sample size for QTL detection in genome scans using sibpairs
0
5000
10000
15000
20000
25000
30000
35000
40000
0 0.1 0.2 0.3 0.4 0.5
Proportion of variance due to QTL
Nu
mb
er o
f p
airs
Sib-correlation = 0
Sib-correlation = 0.5
Power = 90%. Type-I error = 10-5
Estimating when marker is not fully informative
• Using:– Mendelian segregation rules– Marker allele frequencies in the population
IBD can be trivial…
1
1 1
1
/ 2 2/
2/ 2/
IBD=0
Two Other Simple Cases…
1
1 1
1
/
2/ 2/
1 1/
1 12/ 2/
IBD=2
2 2/ 2 2/
A little more complicated…
1 2/
IBD=1(50% chance)
2 2/
1 2/ 1 2/
IBD=2(50% chance)
And even more complicated…
1 1/IBD=? 1 1/
Bayes Theorem for IBD Probabilities
j
jIBDGPjIBDP
iIBDGPiIBDP
GP
iIBDGPiIBDP
GP
GiGiIBDP
)|()(
)|()(
)(
)|()(
)(
), P(IBD)|(
prior
Prob(data)
posterior
P(Marker Genotype|IBD State)
[Assumes Hardy-Weinberg proportions of genotypes in the population]
Worked Example
1 1/ 1 1/ 94
)(4
1)|2(
94
)(2
1)|1(
91
)(4
1)|0(
649
41
21
41)(
41)2|(
81)1|(
161)0|(
5.0
21
3
41
21
31
41
21
31
41
1
1
GP
pGIBDP
GP
pGIBDP
GP
pGIBDP
pppGP
pIBDGP
pIBDGP
pIBDGP
p
Exercise
1 2/ 1 2/
)|2(
)|1(
)|0(
...41...2
1...41)(
)2|(
)1|(
)0|(
10.0,5.0 21
GIBDP
GIBDP
GIBDP
GP
IBDGP
IBDGP
IBDGP
pp
Using multiple markers
• Mendelian segregation rules
• Marker allele frequencies in the population
• Linkage between markers
• Efficient multi-marker (multi-point) algorithms available (e.g., Merlin, Genehunter)
Software for QTL analysis of sibpairs
• Mx• Merlin• Genehunter• S.A.G.E. ($)• QTL Express (regression)• Solar (complex pedigrees)• Lots of others…
http://www.nslij-genetics.org/soft/
George Seaton, Sara Knott, Chris Haley, Peter Visscher
Roslin Institute University of Edinburgh
http://QTL.cap.ed.ac.uk/
QTL Express: User-friendly web-based software to map QTL
in outbred populations
Conclusions (sibpairs)
• Power of sib pair design is low– more relative pairs needed
• more contrasts e.g. extended pedigrees
• selective genotyping– extreme phenotypes are most informative for linkage
– more powerful analysis methods• ML variance component analysis
Maximum likelihood for sibpairs(assuming bivariate normality |
& fully informative marker)Full model:
-2ln(L) = nln|V| + (y-)V-1(y-)
V = f2 + q2 + r2 f2 + q2
f2 + q2 f2 + q2 + r2
Maximum likelihood
Reduced model:
-2ln(L) = nln|V| + (y-)V-1(y-)
V = f2 + r2 f2
f2 f2 + r2
Test statistic
LRT = 2ln(MLfull) - 2ln(MLreduced)
H0(q2=0): LRT ~ ½2(1) + ½(0)
[Fisher et al. 1999]
Example: QTL analysis for dyslexia on chromosome 6p using sib-pairs
Phenotype: Irregular word test181 sib-pairs
~15 Mb
or distribution approachin analysis?
012
012
ˆ*0ˆ½ˆˆ
ˆˆˆ1
IBDIBDIBD
IBDIBDIBD
Expectation approach: use
Distribution approach: use IBD probabilities and mixture distribution
Selective genotyping & sibpairs
• Concordant pairs– both sibs in upper or lower tail of the
phenotypic distribution
• Discordant pairs– one sib in upper tail, other in lower tail
• Powerful design– requires many (cheap) phenotypes
Anxiety QTLs
[Fullerton et al. 2003]
Selection from ~30,000 sibpairs
Results
[Fullerton et al. 2003]
~5 QTLs detected
Variance component analysis in complex pedigrees
• Partition observed variation in quantitative traits into causal components, e.g.,– Polygenic– Common environment (‘household’)– QTL– Residual, including measurement error
• IBD proportions () estimated from multiple markers
“ACEQ” model
[Blackwood et al. 1996] Bipolar pedigree
0
1
2
3
4
0 5 10 15 20 25
cM
test
sta
tist
ic (
LO
D s
core
)
Two-point linkage analysis
Variance component analysis
Blackwood et al. (1996) data
Example: QTL analysis for BMI using a complex pedigree
[Deng et al. 2002]
Recommended