View
230
Download
1
Category
Tags:
Preview:
Citation preview
Lecture 13: Linkage Analysis VI
Date: 10/08/02 Complex models Pedigrees Elston-Stewart Algorithm Lander-Green Algorithm
Complex Linkage Models
The simplest linkage models involve only pairwise recombination fractions ij or adjacent map distances mi,i+1 and map function parameters.
Such models are insufficient to describe many real-life data scenarios.
For Example
Incomplete penetrance. Differential penetrance.
Genetic imprinting. No available controlled and repeated crosses.
Inference on Pedigrees
Pedigrees are extended families sampled from a natural population. They are used when one cannot set up repeated and controlled crosses.
Unknown phenotypes. Unknown genotypes. Founders.
Ordered vs. Unordered Genotype
An unordered genotype does not include phase information nor parental source of alleles.
An ordered genotype includes phase information and parental source of alleles.
Unordered Genotype Ordered Genotype(s)
A1A2B1B1 A1B1/A2B1
A2B1/A1B1
Penetrance Parameters
A penetrance parameter is introduced in the model to explain the relationship between genotype and phenotype.
We code the phenotype as a random vector of discrete or continuous variables, e.g. X=(X1, X2, ..., Xm).
The phenotype Xi of an individual i is conditionally independent of all other family members given his/her genotype and other characteristics (sex, age, etc). iiinnni CGXCCXXGGX ,P,,,,,,,,P 111
Penetrance Parameters - Assumptions
We assume individual i’s phenotype is a single number (discrete or continuous) conditionally independent of all other genotypes and loci, once we condition on the genotype at a particular locus. i.e. we assume one phenotypic variable per locus.
This assumption forces us to ignore multilocus phenotypes and pleiotropic loci.
Conditional Likelihood of Observed Phenotypes
The conditional independence implies that the likelihood of particular phenotypes observed on a pedigree, conditional on the observed genotypes, is simply a product.
iijij
l
j
n
i
iii
n
i
CGX
CGXCGX
,P
,P,P
11
1
Penetrance Parameters: Simple Dominant Disease
Dominant Disease (A1 > A2)
Ordered Genotype
P(Xi | Gi, Ci) = P(Xi | Gi)
A1A1 1
A1A2 1
A2A1 1
A2A2 0
Penetrance Parameters: Dominant Disease with C
Dominant Disease (A1 > A2) but Sex-Dependent
Ordered Genotype
P(Xi | Gi, male) P(Xi | Gi, female)
A1A1 1 0
A1A2 1 0
A2A1 1 0
A2A2 0 0
Liability Classes
Classes of individuals who differ in penetrance parameters are called liability classes.
In one of the examples above males and females form two different liability classes.
Incomplete Penetrance with Liability Classes
Suppose that a dominant disease affects individuals under 30 with probability a and individuals above 30 with probability b.
Class AA Aa aa
<30 years
>=30 years
Penetrance Parameters: Phenocopies
Dominant Disease (A1 > A2) with Phenocopy Rate pr
Ordered Genotype P(Xi | Gi)
A1A1 1
A1A2 1
A2A1 1
A2A2 pr
Dealing with Penetrance and Phenocopies
Biological solution. Identify features that differentiate genetic and non-genetic forms of the phenotype. Then, the phenotype can be recoded as fully-penetrant with no phenocopies.
Approximation. Estimate genotype-specific risk from segregation ratios observed in a family, then set penetrance to the estimates.
Example
Genotype Expected Frequency
Observed Frequency
AA 0.5 0.75
Aa 0.5 0.25
50% of Aa are phenocopies of AA. Or there is only50% penetrance of the a allele.
Penetrance Parameters – More Assumptions
Unless a phenotype is affected by genomic imprinting, we usually assume that different ordered genotypes with the same alleles have the same phenotype.
Genomic imprinting means that the parental origin of the allele affects its expression. For example, a gene may only express if it came from your mother.
Genetic Imprinting in Humans?
Prader-Willi syndrome causes morbid obesity in humans. The disease loci are found on chromosome 15 and working copies must be transmitted from father.
Angelman Syndrome causes development problems including speech impairment and balance disorder. It is caused by a piece of chromosome 15 that is normally activated only on the maternal chromosome.
Problem: Ordered Genotypes are not Observed
Pedigrees almost invariably include missing data, members who have no known genotype.
In addition, there will always be many members for which phase and paternal origin cannot be determined.
In essence, G is not actually observed.
g
gGgGXX PPP
Transmission Parameters
The genotypes in a pedigree are related through genetic inheritance.
Conditional on the parental genotypes, the offspring genotypes are independent of all other members in the pedigree.
Transmission parameters are those parameters which determine the transmission of genes: the recombination fractions.
Independence of Transmission Probabilities
Let Gk be the genotype of offspring k. Let GkM be the allele transmitted by the offspring’s mother and GkP be the allele transmitted by the father. Then,
pkPmkMpmk GGGGGGG PP,P
Maternal Transmission: Generate Haplotype
M P
1mMG
2mMG
lmMG l
mPG
2mPG
1mPG
1
-13mMG 3
mPG 1
1
lmPmPmM GGGZ ,,, 21
l ,,1
otherwise1
and 1 locibetween ion recombinat1 iii
1mMG
2mPG3mPG
lmPG
Z
Maternal Transmission: Transmit Haplotype
l
i ii
ii
ZGmkM r
rGZZG
iikM
1 1 if1
1 if1PP
ZmmkM
ZmkMmkM
GZGZG
GZGGG
P,
,PP
Population Parameters
What about the pedigree members that have no parents? There are no parental genotypes on which to condition.
The distribution of genotypes in these individuals are determined by the so-called population parameters.
In the worst case, this would require (m1m2...ml)2-1 independent parameters, where mi is the number of alleles at locus i.
Population Parameters - Assumptions
Assume Hardy-Weinberg equilibrium (random union of haplotypes) so that the genotype frequencies are determined by the haplotype frequencies. Then there are (m1m2...ml)-1 independent parameters.
Assume linkage equilibrium (random union of alleles at multiple loci into haplotypes). Then there are m1 + m2 + ... + ml – l independent allele frequencies.
Overall Genotype Probabilities
mnmnnpfpfff GGGGGGGGG ,,,1,111 ,P,PPPP
1 1 1 1,, ,transpoppen
PPP
G G
n
i
f
i
n
fipimiiiii
g
n
GGGGGX
gGgGXX
Computation
There are (m1m2...ml)2n terms in the summation.
There are 2n probabilities in each product. Thus, there are (m1m2...ml)2n(2n-1) multiplications
and (m1m2...ml)2n-1 additions.
The calculation grows exponentially in number of loci l and number of individuals n.
Elston-Stewart Algorithm
Algorithm is similar to computation for Hidden Markov Models based on Forward-Backward algorithm. The hidden states are the genotypes.
One must classify people as falling ahead of or behind other people, i.e. we need a linear arrangement of people in the pedigree.
Ordering People in a Pedigree
k
Forward/Backwards Probabilities
kikikk GXG ,P
0P if0
0P ifP
k
kkki
ikk
G
GGXG
G1G2 Gk
X1 X2 Xk
...Gk+1
Xk+1
...
Total Probability
kG
kkkkn GGXX ,,P 1
Calculating Forward Probability
fkGGXG kkkkk ,PP
siblings
,
,PP
,PP
s Gpmsssss
GGppmmpmkkkkk
s
pm
GGGGGX
GGGGGGXG
Calculating Backward Probabilities
leaf is if 1 kGkk
children
,PP
PP
c Gskccccc
Gsssskkkk
C
s
GGGGXG
GGXGXG
Example
4
AA aa
Aa Aa aaaa
AaAaaa
1 2
3 5 6
7 89
Using 5 as Proband
5
555591 ,,PX
GGXX
Example – Calculations Needed
4
AA aa
Aa Aa aaaa
AaAaaa
1 2
3 5 6
7 89
Example – Calculations Needed
4
AA aa
Aa Aa aaaa
AaAaaa
1 2
3 5 6
7 89
Example – Calculations Needed
4
AA aa
Aa Aa aaaa
AaAaaa
1 2
3 5 6
7 89
Example – Calculations Needed
4
AA aa
Aa Aa aaaa
AaAaaa
1 2
3 5 6
7 89
Forward Probabilities: Founders
2
6
23
22
21
a
a
a
A
paa
paa
paa
pAA
Backward Probabilities: Leaves
1
1
1
9
8
7
Aa
Aa
aa
Examples – Calculations Completed
4
AA aa
Aa Aa aaaa
AaAaaa
1 2
3 5 6
7 89
Backward Probability 4
children
,PP
PP
cskccccc
Gsssskkkk
GGGGXG
GGXGXGs
4
2
111
2
11111
,P1P,P0P0P1P
2
2
8734
a
a
p
p
aaAaAaAaAaaaAaaaaaaaaaaaAaAa
1 means affected0 means not affected
Example – Calculations Completed
4
AA aa
Aa Aa aaaa
AaAaaa
1 2
3 5 6
7 89
Forward Probability 5
4
14
111
,P1P,P1P
24
222
4215
Aa
aAa
pp
ppp
aaAAAaAaAaaaAAaaAAAaAaAa
Example – Calculations Completed
4
AA aa
Aa Aa aaaa
AaAaaa
1 2
3 5 6
7 89
Backward Probability 5
2
2
11111
,P1P0P1P
2
2
965
a
a
p
p
aaAaAaAaAaaaaaAaAa
Example – Calculations Completed
4
AA aa
Aa Aa aaaa
AaAaaa
1 2
3 5 6
7 89
Example – Final Calculation
8
24
,,P
26
224
55
555591
5
Aa
aAa
X
pp
ppp
AaAa
XXXX
Efficiency of the Elston-Stewart Algorithm
In our example, each genotype was defined without ambiguity. There were no sums over genotypes.
In general, this is not true and the forward and backward probabilities must sum over the possible parental genotypes or spousal genotypes respectively.
The ES algorithm calculations increase exponentially with respect to the number of genotypes.
Fortunately, the ES algorithm calculations only increase linearly in the number of pedigree members.
Lander-Green Algorithm
View the pedigree as a Hidden Markov model on haplotypes.
Pattern of inheritance at a single locus is described by v a 2(n – f)-long vector of 0’s and 1’s indicating if allele is paternal (0) or maternal (1) in origin.
There are 22(n-f) such inheritance vectors possible.
Inheritance Vector v
4
AA aa
aA aA aaaa
Aa
1 2
3 5 6
7 89
Aaaa
Gamete v
4M 0|1
4P 0|1
5M 0|1
5P 0|1
7M 1
7P 0|1
8M 0
8P 0|1
9M 0
9P 0|1
Conditional Probability
G
vGGXvX PPP
Prior to viewing the data, all inheritance vectors are equally likely.
11
PP
Q
vXX
t
ii
Multiple Loci
Suppose there are l loci. Then, the joint probability can be factored 12112312121 ,,,P,PPP,,,P XXXXXXXXXXXXX lll
But, conditional on the vi, Xi is independent of all Xj with j<i.
iiiii vXXXXvX P,,,,P 121
Multiple Loci (cont)
And, conditional on the inheritance vectors of preceding loci, the inheritance vector at locus i is independent of all but the immediately preceding inheritance vector.
jfn
ij
i
iiii vvvvvv
)(2
11
1121
1
P,,,P
Multiple Loci (cont)
11
PPP
PPPPP,,P
1211
1
221121111
1 2
llt
vlllll
v vl
QTQTQ
vvXvv
vvXvvvvXXX
l
Recommended