Upload
emmeline-davidson
View
217
Download
0
Embed Size (px)
Citation preview
Allelic Pattern Sampler: Genetic
Combinations Underlying Complex
Diseases
Polygenic diseases (traits)Polygenic diseases susceptibility arise due to
contribution of a set of genes.Heterogeneity: different genetic backgrounds arise
the same disease.The disease outcome is correlated with the genetic
background rather that is determined.
Environmental effect or heterogeneity: gang-specific eyebrows. А common signature is
improbable.
Polygenic contributionThe genes interact (epistasis)
The genes can behave as interacting only relatively to the disease.
• Complementary alleles. An allele’s trait explication requires another allele of another gene.
• Alternative pathways.
The genes can contribute independently in an additive way.
The pattern concept. An example: image recognition
(1,0) (1,1/2) (1/2,1/2)
(1,1) (1/2,1) (0,1)
Allelic (genetic) patternWe know levels of a trait (i.e. disease) and
we know alleles of candidate genes that these persons carry.
A pattern is a set of alleles of the genes, whose presence in a genome a whole is associated with the trait.
• Any subset of the pattern is associated less reliable than the while pattern is. Any superset, too. So, a pattern is a locally minimal subset satisfying the statements above.
• A pattern may contain only one allele.
0%
25%
50%
75%
100%
controls 48 183
patients 49 163
DR4 non - DR4
0%
25%
50%
75%
100%
controls 40 191
patients 52 160
CCR5 Del non - (CCR5 Del)0%
25%
50%
75%
100%
controls 1 230
patients 17 195
DR4 + CCR5 Del non - (DR4 + CCR5 Del)
OR 20.1
p<0.0001
The solid line points to an independent combination ratio.
Cross-sectional comparison of MS patients and controls among carriers and non-carriers of alleles of DRB1 HLA gene, CCR5 chemokine receptor gene deletion and their combination.
Favorova OO, Andreewski TV, Boiko AN, Sudomoina MA, Alekseenkov AD, Kulakova OG, Slanova AV, Gusev EI. 2002. The chemokine receptor CCR5 deletion mutation is associated with MS in HLA-DR4-positive Russians. Neurology 59(10):1652-5.
Example of a genetic pattern for a complex polygenic Example of a genetic pattern for a complex polygenic disease.disease.
Patterns hide each otherMore-than-2-allele-in-a-locus union of the combinations.....|0 0 | a b | 0 0 |........|0 0 | c 0 | 0 0 |....
The strongest association (not obligatory the most reliable) statistically shadows all the other ones.
disease level
Independency question
We cannot invent a correct concept of a space of patterns, because the operation of addition (as a union of allelic sets) is not defined for every pair, thus we cannot apply a component analysis technique.
Mutual isolation of patterns
We say that a pattern is considered isolated from a set of other patterns if we remove the influence of all the other patterns before we consider our pattern’s association with the trait.
• It is an analog of adjustment procedure.
Set of patternsAs far as we cannot take one pattern apart,
we consider a set of patterns simultaneously.
Data• We have genotypic data and phenotypic
trait level data for some individuals. • The trait levels are comparative
characteristics. They cannot be measured, they can only be compared.
• We want to obtain allelic patterns, which best characterizes the relation between genotypic and phenotypic data.
We will look for a whole set of patterns, which maximises the probability that all the patterns are associated with the disease in in the mutually isolated manner.
• A good patternset forms a kind of “gradient basis” in the genome-trait association.
The set of patterns is a variable to be optimized
Data structures
Trait Incidence Gene data
Level matrix
0.1 1 0 0 a c | d d | f s |....
0.4 0 1 1 c f | a b | b a |....
0.7 0 0 0 a a | c b | a c |....
0.9 0 0 1 c f | f b | b s |....
0.2 1 1 1 a f | a d | b c |....
… ....... ........................
The correspondence of the two matrices below shows the set of patterns quality.
Set of patterns
0 0 | d 0 | 0 0 |....
0 0 | a 0 | 0 0 |....
0 f | 0 0 | b 0 |....
The incidence classification
Incidence matrix1 0 0 0 1 10 0 00 0 11 1 11 0 10 1 01 0 10 0 1.......
All the cases are classified into 2n possible classes based on the row in the incidence matrix.
000
100
010
110
001
101
011
111
It is the direction of the second pattern.
The classes could be represented by the vertices of a hypercube.
A set of parallel edges of the cube corresponds to a pattern.
A pair of classes comparison
We can only compare the disease (trait) levels, so the appropriate statistics for the comparison is the inversions number.
000
100
010
110
001
101
011
111
x
y
Two classes of trait levels, which are on the same edge, differs due to the “isolated” influence of the edge’s pattern. So, we base the patternset consideration on such pairwise comparisons.
A pair of classes. Alternative hypotheses.
To test a pair of adjacent classes, we formulate three hypotheses about the corresponding patterncorresponding pattern: null-hypothesis: X and Y has the same median,
e.g. X≡Y “positive” hypothesis: median (Y) > median (X)
(predisposing pattern) “negative” hypothesis: median (Y) < median (X)
(protecting pattern).
We compare the hypotheses in a Bayesian paradigm.
null -+
const
inv# 80
p 0.25
The likelihoods for a pair: example
The larger the minor class is, the more sharp are all the likelihoods. If it is 1 or 0, all the 4 lines are equal.
The null-hypothesis posterior for a pattern
• A pattern’s likelihood for a hypothesis is a product of the likelihoods of all corresponding class pairs.
• If a pattern is carried by all the genomes in the data or is not carried by any (it is uninformative), null-hypothesis prior for the pattern is 1. For informative patterns, we use uniform prior.
0
0 0
0 0
for a pattern |
|
| | |
P H data
P data H P H
P data H P H P data H P H P data H P H
The quality of a set of patterns
• The pairwise comparisons for all classes, which correspond to parallel edges together qualify a pattern.
• All patterns together qualify a set of patterns.
• A good pattern set is one without bad patterns.
000
100
001
101
010
110
011
111
0 01
1 |p
i
P H P H i data
is the quality
0P H
of a set of patterns.
Optimization of the pattern set quality
• Direct enumeration is ineffective.• A kind of gradient maximisation is prone
to be locked in local maxima.
Thus, we use the Monte-Carlo Markov Chain (MCMC) method.
Definitely, it is a hybrid Metropolis-Hastings-Gibbs with random choice of updates.
A mutation:
0 0 | d 0 | 0 0 0 0 | a 0 | 0 0 0 f | 0 0 | b 0
0 0 | d 0 | 0 0 0 0 | a 0 | 0 0 0 f | c 0 | b 0
Possible updating steps
A recombination:
0 0 | d 0 | 0 0 0 0 | a 0 | 0 0 0 f | 0 0 | b 0
0 0 | d 0 | 0 0 0 0 | a 0 | b 0 0 f | 0 0 | 0 0
Output statistics*** Patternsets statistics: ***
| alpha-fibr | fibr-249 | fibr-148 | ApoE-491 | Apoe-427 | ApoE-epsilon | Hind_III | LPL-Ser447Ter | ACE | CMA/B | gender |
+-------------+-----------+-----------+-----------+-----------+---------------+-----------+---------------+-----+-------+---------+
| 0 0 | 0 0 | 0 0 | T 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 |
| 0 0 | 0 0 | 0 0 | 0 0 | C T | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 |
Registered 64 times.
Pattern posteriors to be positive:
3.709e-10 7.143e-11
Pattern posteriors to be negative:
0.001556 0.03835
Point reliability = 5.9658e-05
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Patterns statistics:
| alpha-fibr | fibr-249 | fibr-148 | ApoE-491 | Apoe-427 | ApoE-epsilon | Hind_III | LPL-Ser447Ter | ACE | CMA/B | gender |
+-------------+-----------+-----------+-----------+-----------+---------------+-----------+---------------+-----+-------+---------+
| 0 0 | 0 0 | 0 0 | 0 0 | C 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 |
Occured 5927 times. +/- : 0/5927
(Mentioned 41 times. +/- : 0/41 )
maximal reliabilities as + and - are 4.81058e-10 and 0.0172151 .
| alpha-fibr | fibr-249 | fibr-148 | ApoE-491 | Apoe-427 | ApoE-epsilon | Hind_III | LPL-Ser447Ter | ACE | CMA/B | gender |
+-------------+-----------+-----------+-----------+-----------+---------------+-----------+---------------+-----+-------+---------+
| 0 0 | 0 0 | 0 0 | T 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 |
Occured 3022 times. +/- : 0/3022
(Mentioned 19 times. +/- : 0/19 )
maximal reliabilities as + and - are 4.74783e-06 and 0.00205254 .
A(llelic) P(attern) SamplerAPSampler software was developed …
Favorov AV, Andreewski TV, Sudomoina MA, Favorova OO, Parmigiani G, Ochs MF: A Markov chain Monte Carlo technique for identification of combinations of allelic variants underlying complex diseases in humans Genetics 2005, 171(4):2113-2121.
… and applied to real data
Favorova OO, Favorov AV, Boiko AN, Andreewski TV, Sudomoina MA, Alekseenkov AD, Kulakova OG, Gusev EI, Parmigiani G, Ochs MF: Three allele combinations associated with multiple sclerosis BMC Med Genet 2006, 7:63.
Sudomoina MA, Nikolaeva TY, Parfenov MG, Alekseenkov AD, Favorov AV, Gekht AB, Gusev EI, Favorova OO: Genetic risk factors of arterial hypertension: analysis of ischemic stroke patients from the Yakut ethnic group Dokl Biochem Biophys. 2006 Sep-Oct;410:324-6 (Rus).
Chikhladze NM, Samedova KhF, Sudomoina MA, Thant M, Htut ZM, Litonova GN, Favorov AV, Chazova IE, Favorova OO: Contribution of CYP11B2, REN and AGT genes in genetic predisposition to arterial hypertension associated with hyperaldosteronism Kardiologiia 2008;48(1):37-42 (Rus).
Validation I: Exact Fisher
Patients Controls
Carriers PC CC
Non-carriers PNC CNCpattern
p (pattern)
1-st n
ull
distrib
utio
n
2-n
d n
ull
distrib
utio
n
3-rd
d
istribu
tion
. . . . .
N-th
null
distrib
utio
nValidation II: permutation
Genetic data
Perm
ute
d
dise
ase
d
ata
Perm
ute
d
dise
ase
d
ata
Perm
ute
d
dise
ase
d
ata
. . . . .
Perm
ute
d
dise
ase
d
ata
Permutation
Dise
as
e d
ata
Null
distrib
utio
np
Pfail [pattern]=
Pfail [p (pattern)]
Validation III: FDR
Test passed
Test failed
True TP FN
False FP TN
p ≈FP/(FP+TN) FDR ≈FP/(FP+TP)
Validation III: FDR: evaluation
1-st n
ull
distrib
utio
n
2-n
d n
ull
distrib
utio
n
3-rd
d
istribu
tion
. . . . .
N-th
null
distrib
utio
n
Validation III: FDR: calculation
Genetic data
Perm
ute
d
dise
ase
d
ata
Perm
ute
d
dise
ase
d
ata
Perm
ute
d
dise
ase
d
ata
. . . . .
Perm
ute
d
dise
ase
d
ata
Orig
inal
distrib
utio
n
Permutation
Dise
as
e d
ata
Null
distrib
utio
np
Validation III: FDR: evaluation II
Evaluated directly
Approximated
T
FDR(T1) >FDR(T2)
Validation: FDR: example• 61 markers and gender• 120 controls and 255 MS patients• Among 255, 155 give response to a medication
Pattern contains 3 informative alleles: Gender:1; 27:T; 42:C.
The pattern is mentioned in statistics as occurred 1 times at line: 3011.
Occured in 1 patternsets 1 times.Mentioned in patternsets at lines: 731.
Fisher 4-pole table: 1 2 levels 51 51 carriers 60 171 noncarriers
p-value = 1.98632243779503e-05
FDR=0.00179340028694405 (2.5e-06/0.001394)
Pattern contains 3 informative alleles: 21:G; 37:T; 53:C.
The pattern is mentioned in statistics as occurred 1 times at line: 3227.
Occurred in 1 patternsets 1 times.Mentioned in patternsets at lines: 427.
Fisher 4-pole table: 0 1 levels 1 19 carriers 89 118 noncarriers
p-value = 0.000368247913041713
FDR <=1 (0.0067765/1e-06)
Authors
Alexander Favorov 1,3
Olga Favorova 2
Marina Sudomoina 2
Giovanni Parmigiani 3
Michael Ochs 3
1. State Scientific Centre “GosNIIGenetica”, Moscow, Russia.2. Russian State Medical University, Moscow, Russia. 3. The Sidney Kimmel Cancer Center at Johns Hopkins, Baltimore, MD, USA4. Faculty of Bioinformatics and Biotechnology, MSU, Moscow5. Yakut Research Center, Russian Academy of Medical Sciences and Government
of the Sakha Republic (Yakutia), Yakutsk 6. Institute of Information Transmission Problems RAS, Moscow, Russia7. School of Pharmacy - CCRCB – QUB, Belfast, UK
Acknowledgements Alexey Alexeenkov 2
Alexey Boiko 2
Evgeniy Gusev 2
Alexey Boiko 2
Mikhail Parfenov 2
Tatiana Nikolaeva 5
Mikhail Gelfand 6
Vsevolod Makeev 1
Andrew Mironov 4 Koen Vanderbroek 7
Thank your for your attention.
MS case-control study• The method was applied to a database that
contains results of the genotyping of DNAs from 237 unrelated patients with clinically defined MS and from 358 healthy unrelated controls (all of them were Russians).
• 15 polymorphous sites of candidate loci for MS development were analyzed.
• The phenotypic trait (i.e. the MS susceptibility) levels were 1 for patients and 0 for controls.
• There were two starts: one for 2 patterns, one for three.
• DRB1 *15(2)
• TNFa9
• CCR532 + DRB1 *04
TGF1-509 *C + DRB1 *18 + +49CTLA4 *G (trio 1)
-238 TNF *B1 + -308 TNF *A2 + +49CTLA4 *G (trio 2)
APSampler identified the following patterns as MS-associated:
The Fisher’s 4-pole association test result for the trios and their 2-elements subsets
CombinationsPatients, N
(%)Controls, N
(%)p Value
–509TGFβ1*C,DRB1*18(3),CTLA4*G (trio 1) 5 (5) 0 (0) 0.009
–509TGFβ1*C,DRB1*18(3) 5 (5) 2 (1) 0.114–509TGFβ1*C,CTLA4*G 60 (61) 88 (57) 0.603DRB1*18(3),CTLA4*G 5 (5) 1 (1) 0.035
–238TNF*B1,–308TNF*A2,CTLA4*G (trio 2) 11 (9) 0 (0) 0.003
–238TNF*B1,–308TNF*A2 13 (10) 4 (5) 0.198–238TNF*B1,CTLA4*G 38 (30) 15 (17) 0.037–308TNF*A2,CTLA4*G 23 (18) 13 (15) 0.580
The permutation test gave the values for the trios were less than 0.3%
Analysis of genetic background of ischemic stroke (IS) patients of Yakut descent
Total (n)(mean age ± SD)
Men (n)(mean age ± SD)
Women (n)(mean age ± SD)
115(58.1 ± 11.5)
75(55.9 ± 12.3)
40(62.2 ± 8.4)
108(57.7 ± 11.3)
64(55.9 ± 12.1)
44(60.3 ± 9.6)
Examined polymorphic lociGene Chromosome Coding region Regulatory regions
FGA4q28
A4266G (Thr312Ala)
FGB C-249T; C-148T
APOE 19q13.2T3937C + C4075T
(Cys112Arg + Arg158Cys)
A-491T; T-427C
LPL 8p22C1595G
(Ser447Ter)T495G
ACE 17q23 I/D
CMA 14q11.2 G-1903A
IS genetic background analysis
Allele or allelic combination p(pcorr) OR CI (95%)
APOE
-427C 0.001 0.3 0.1-0.6
-427T/C 0.0003 0.2 0.08-0.5
-427T/T 0.001 3.8 1.6-8.9
ε2 0.01* 0.3 0.1-0.8
ε2/ ε3 0.03* 0.3 0.09-0.7
APOE -491T + FGB -249T 0.02 0.3 0.1-0.9
APOE -491T + LPL 495T/T 0.01 0.3 0.08-0.8
Associations identifiedAssociations identified
**p-value is counted by Fisher criteria it 8-pole tablep-value is counted by Fisher criteria it 8-pole table
0
20
40
60
80
100
%
p<0.0001*
00
2211
33
Allele 495TLPL carriership
3-allelic pattern: -249C FGB, ε4 APOE and -1903A CMA carriership
0
10
20
30
40
50
%
p=0.0003*
00
33
22
11 -249С FGB + -1903A CMAp=0.017
ε4 APOE + -1903A CMAp=0.023