Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Introduction to Systems Biology4.8. SNP information in the context of Systems Biology
Toni Reverter
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
2
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
� SNPs are major contributors to genetic variation
� SNPs comprise some 80% of all known polymorphisms
� SNP density ~ 1 per 1,000 base pairs
� SNPs are mostly biallelic � less informative than microsats
� SNPs are more frequent and mutationally more stable
� Two alleles � automated genotyping tools are feasible
SNPs are useful markers!
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
3
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
Fundamental Theorem of the HapMap
Let rAB = correlation between alleles of two SNPs:
SNP 1 � A alternate allele ‘a’SNP 2 � B alternate allele ‘b’
……where B is the functional (associated) SNP
If NBC = sample size required to detect rBC:
Phen � C alternate allele ‘c’
Then, the sample size required to detect rAC:
NAC = NBC (rBC/rAC)2
……impliesrAC = rAB rBC A
B
C
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
The theory which drives the HapMap Project does not hold in general, and in fact can be grossly misleading, most strikingly so when there is substantial LD, the very situation HapMap is designed to model.
Fundamental Theorem of the HapMap
4
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
Only if the genes underlying disease traits have one wild-type and one (or one major) susceptibility allele (ie., when allelic heterogeneity is low) is statistical analysis likely to detect association of the causative allele (or linked markers) with the disease phenotype.
Terwilliger and Hiekkalinna. 2006 . Cont’ed
Case-control (aka. Selective) genotyping, theoretically, can lead to even less power than simple random sampling.
Replicating results is a major difficulty
Variations in allele frequencies among interacting loci can markedly affect the power to detect their effect, ……, which may account for the difficulties in replicating association results.
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Notorieties of SNP data in association studies
• High dimensionality• Large p, Small n � the p is so cheap!
• Imposed categorization• Values are 0, 1 or 2 for as many copies of variant allele
• Raw data is actually a fluorescent signal
• Back to front threshold vs underlying continuity
• Complicated correlation structure among predictors• Linkage
• Linkage disequilibrium
• Large p, Small n
SNP Information in the Context of Systems Biology
5
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
Genetical Genomics (eQTL)
Definition 1: Find DNA variants (eg. SNP) associated with the expression of genes.
Definition 2: Use arrays to identify genes that are DE in relevant tissues of individuals sorted by QTL genotype. If those DE genes map the chromosome region of interest, they would become very strong candidates for QTL.
Jansen and Nap, 2001
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
Note 1: Most significant connections are often to itself (large dark diagonal) � Cis
Note 2: Vertical lines are hot-spots (Major QTL of co-regulated genes) � Trans
DeCook et al. (2006)Genetics 172:1155-1164
Kendziorski & Wang (2006) A review of statistical methods for expression Quantitative Trait Loci mapping. Mammalian Genome 17:509.
Perez-Enciso, Quevedo & Bahamonde (2007) Genetical genomics: use all data. BMC Genomics 8:69.
Genetical Genomics (eQTL)
6
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
Perez-Enciso, Quevedo & Bahamonde (2007) Genetical genomics: use all data. BMC Genomics 8:69.
Genetical Genomics (eQTL)
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
2006
Genetical Genomics (eQTL)
7
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology
2010
Genetical Genomics (eQTL)
8
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology Genetical Genomics (eQTL)
2010
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
SNP Information in the Context of Systems Biology Resources to access the results from whole-
genome association studies are becoming increasingly available
9
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Published Genome-Wide Associations through 3/2009, 398 published GWA at p < 5 x 10-8
NHGRI GWA Catalog
www.genome.gov/GWAStudies…compiled from 331 publications and 1533 SNPs (June 1, 2009)
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Reverter, Chan, Lehnert, et al. (2008) Dissection of beef quality phenotypes using a Myogenin network-anchored systems biology approach . Aust J of Exp Agric, 48:1053-1061.
Fortes, Reverter, Zhang, et al. (2010) An association weight matrix for the genetic dissection of puberty in beef cattle. PNAS (in press)
…Livestock Focussed
Smith, Li, Ingham, Collis, McWilliam, Dixon, Norris, Mortimer, Moore and Reverter (2010) A genomics-informed, SNP association study reveals FBLN1 and FABP4 as contributing to resistanc e to fleece rot in Australian Merino sheep . BMC Veterinary Research 6:27.
SNP Information in the Context of Systems Biology
10
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Relatively inexpensive bioinformatics approaches may partially replace
expensive wet-lab experiments and reduce the amount of genotyping, sequencing,
and phenotyping required
Reduce – Refine – Replace
Publicly available data grossly underutilised
Motivation:
Reverter, Chan, Lehnert, et al. (2008) Dissection of beef quality phenotypes using a Myogenin network-anchored systems biology approach. Aust J of Exp Agric, 48:1053-1061.
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
1,352 Cattle10 Phenotypes
189 Cattle10K SNPs
TRANSCRIPTOMICSTRANSCRIPTOMICSTRANSCRIPTOMICSTRANSCRIPTOMICS
PHENOMICSPHENOMICSPHENOMICSPHENOMICS GENOMICSGENOMICSGENOMICSGENOMICS
MYOG network-anchored systems biology approach
9 Microarray Experiments147 Hybridizations (~9K clones – 822 genes)
47 Conditions (treatments)
LITERATUREBlais et al. 2005. An initial blueprintfor myogenic differentiationGenes & Dev. 19:553
MYOGTargets
Is there a gene expression contrast that can be related with a quantitative
phenotype of routine use in genetic improvement?
QUESTIONQUESTIONQUESTIONQUESTION
11
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Ten Phenotypes
2.0321.192.1821.29LOS
5.2939.845.7240.47LDL
0.494.230.794.20TEN
Meat Quality
2.9411.544.2711.34P8F
2.968.324.318.41RIB
1.504.872.374.89IMF
Fat Depots
22.45294.4950.81291.43CWT
0.201.260.321.27ADG
1.7412.482.0612.32DFI
3.24115.886.72115.61HIP
Growth and Development
SDMeanSDMean
Genotyped (N=189)All (N=1,352)Phenotype
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
WHOLE-GENOME ASSOCIATION � 10K � 651 SNP-QTL(SNPAssoc)
12
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
1. High vs Low quality diets;2. Weight gain vs weight loss treatments;3. Holstein vs Japanese black cattle over a year’s time (3 time points);4. Holstein cattle at the end vs at the beginning of a year’s experiment;5. Japanese black cattle at the end vs at the beginning of a year’s experiment;6. Treated vs untreated adipogenesis stimulant in-vitro cells;7. Piedmontese by Hereford (P×H) vs Wagyu by Hereford (W×H) crosses;8. Late vs early developmental stage in P×H crosses;9. Late vs early developmental stage in W×H crosses;10. Cattle treated vs cattle untreated with Vitamin A;11. Jersey vs Limousin cattle (steers only);12. Jersey vs Limousin cattle (heifers only);13. W×H at 12 months of age vs W×H at 3 months of age;14. P×H at 12 months of age vs P×H at 3 months of age;15. W×H vs P×H from from 3 to 12 months of age (eg. 13. vs 14., above)
9 EXPERIMENTS (147 Hybs)15 EXPRESSION CONTRASTS
Reverter et al. 2006. A gene co-expression network for bovine skeletal muscle inferred from microarray data. Physiological Genomics 28:76-83.
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
( ) ( )21111
20000 ,;,;)( σµφπσµφπ iii dddf +=
( ) )(/,;)( 200000 iii dfdd σµφπτ =0
100
200
300
400
500
600
700
800
900
-1.5 -1 -0.5 0 0.5 1 1.5
DE
Component forDE genes
Component forNon-DE genes
Measure of (possible)Differential Expression
= Mixture of two components
Probability of Not-DE
15 Expression Contrasts
McLachlan et al. 2006. Bioinformatics 22:1608
13
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
LITERATUREBlais et al. 2005. An initial blueprintfor myogenic differentiationGenes & Dev. 19:553
MYOGTargets
22 Validated MYOG-targets were present in our expression
profiling experiments
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
LITERATUREBlais et al. 2005. An initial blueprintfor myogenic differentiationGenes & Dev. 19:553
MYOGTargets
22 Validated MYOG-targets were present in our expression
profiling experiments
14
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Beef CRC Gene Expression
Experiments
Literature on MYOG validated
targets
22 Genes – 35 Connections
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Significance of MYOG-targets on each Expression Contrast(via clones on the array)
Cf. Meta-analysisPyne et al. 2006. Bioinformatics 22:2516
15
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Significance of MYOG-targets on each Phenotype(via SNPs on the 0.5Mb neighbourhood)
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
M � MYOG Target Genes (via clones) × Expression ContrastsS � MYOG Target Genes (via SNPs) × Phenotypes
P = MTS � Expression contrasts × Phenotypes
Two Incidence Matrices
{ } ( )ikiji
jkjk smpp22
1=∏=⇒=P
( ) 244~ln2 =− dfjkp χ
(Fisher’s inverse chi-square method)PNAS, Nov 2005, 102:17296
16
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Two Incidence Matrices
P = MTS � Expression contrasts × Phenotypes
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
1. High vs Low quality diets;2. Weight gain vs weight loss treatments;6. Treated vs untreated adipogenesis stimulant in-vitro cells;
9987842488410474129Tot
10001010111101111LDL
5000000000111110TEN
14212100220200011P8F
8110001111000011RIP
22322101331101022IMF
7011100000101011CWT
20112111000322132ADG
8110001111000011DFI
5000000000011210HIP
151413121110987654321
Tot.Expression ContrastPhen
8. Late vs early developmental stage in P×H crosses;9. Late vs early developmental stage in W×H crosses;
ACTA2, ANKRD1, ATF4, CALM1, CAV3, CRYAB and PDLIM3
17
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
CALM1 CALM1PDLIM3 PDLIM3 PDLIM3
CAST
CALM1 CALM1
CALM1
CALM1
CALM1
PDLIM3
PDLIM3
PDLIM3
PDLIM3
PDLIM3
PDLIM3
PDLIM3
ACTA2
ACTA2
ACTA2
ACTA2
ACTA2 PDLIM3
CRYAB
CRYAB
CRYAB
CRYAB
ATF4
ATF4
ATF4
ATF4
ATF4
CALM1
CALM1
CALM1
ACTA1
ACTA1
ACTA1
ACTA1
ACTA1
DES
DES
DES
DES
DES
ATF4
PDLIM3
PDLIM3
ATF4
CALM1
CALM1
CALM1
CRYAB
ACTA1 ACTA1
PDLIM3
PDLIM3
PDLIM3
CALM1
PDLIM3
CRYAB
CRYAB
PDLIM3PDLIM3
LGALS1
ACTA2
ACTA2
CRYAB
ACTA1
PDLIM3
PDLIM3
PDLIM3
PDLIM3
MYH7
ACTA1
PDLIM3
CRYAB
CRYAB
CAST
ATF4
ATF4
CRYAB
ACTA1
ATF4
CALM1
CALM1
CALM1
CRYAB
CRYAB
DES
DES
DES
DES
ACTA1
CRYAB
CRYAB
ACTA1
ACTA1
ACTA1
LGALS1
MYOM2
CALM1
CALM1
MEF2C MEF2C MEF2C MEF2C MEF2C MEF2C MEF2C
CAV3 CAV3 CAV3 CAV3 CAV3
MYOM2 MYOM2 MYOM2 MYOM2
PDLIM3 PDLIM3 PDLIM3
PDLIM3 PDLIM3
PDLIM3
PDLIM3
LGALS1
ACTA2
ACTA2
ACTA2
MEF2C
CRYAB
CRYAB
CRYAB
CRYAB PDLIM3
MEF2C
ATF4
ATF4
ATF4
ATF4
CALM1
CALM1
DES
DES
DES
DES
ACTA1
ACTA1
Genes contributing the most in the Expression by Phenotype
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Beef CRC Gene Expression
Experiments
Literature on MYOG validated
targets
Beef CRCQuantitative Genetics
(~1,300 cattle)
PRIMARYINPUTS
MYOG targets in Beef CRC experiments
Beef CRCWhole-Genome Scan
(~200 cattle)
SECONDARYINPUTS
Gene Network forMYOG targets
DifferentialExpression forMYOG targets
SNP-QTLs forMYOG targets
PRIMARYOUTPUTS
Expression Contrasts Related with PhenotypeSECONDARY
OUTPUTS
Reverter, Chan, Lehnert, et al. (2008) Dissection of beef quality phenotypes using a Myogenin network-anchored systems biology approach. Aust J of Exp Agric, 48:1053-1061.
18
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
10 Genes to PCR 16 SNPs in 5 Genes
N Genes: ~4,000 Chip � 3,200 Analysis � 155 Diff.Expr � 10 PCR � 5 for SNPs (16) � 2
19
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
MATERIAL
� 866 Cows� 51 Sire families� 22 Phenotypes� 50K SNPs
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
CHIP: 50,070 SNPs
SNP p<0.05 in 10 or more phenotypes correlated to AGECL (abs correl >0.28)
Yes
No
Selected Top 89 SNPs
Close Far Very Far Unmapped Close Far Very Far Unmapped
13 56 none 2
1 gene = 1 SNP
28633 384 1,44419556
1 gene = 1 SNP P<0.05 in AGECL or 3 other phenotypes
No Yes
Selected SNPs
AWM SNPs
3037 Close; 64 Very Far; 56 Far; 2 Unmapped
CHIP: 50,070 SNPs
SNP p<0.05 in 10 or more phenotypes correlated to AGECL (abs correl >0.28)
Yes
No
Selected Top 89 SNPs
Close Far Very Far Unmapped Close Far Very Far UnmappedClose Far Very Far Unmapped
13 56 none 213 56 none 2
1 gene = 1 SNP1 gene = 1 SNP
28633 384 1,44419556
1 gene = 1 SNP1 gene = 1 SNP P<0.05 in AGECL or 3 other phenotypes
No Yes
Selected SNPs
AWM SNPs
3037 Close; 64 Very Far; 56 Far; 2 Unmapped
1. Select SNPs from a genome wide association
20
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
2. The Association Weight Matrix
� 3,159 Genes as rows (via neighbouring SNP)
� 22 Traits as columns� Each cell {i,j} value is
the normalized additive effect of ith-SNP on the jth-trait.
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
2. Use AWM columns to build a phenotype network (PCIT)
21
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
3. Use AWM rows to build a gene network (PCIT)
� MCODE: network density analysis� DAVID: discover gene pathways
� GOrilla: discover gene ontology enrichment.
� RIF: discover key transcription factors
� Genomatix: find corresponding binding sites for predicted targets of key TF.
22
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
Figure. Puberty network extracted from the AWM. A Entire network: Nodes represent 3,159 genes and SNPs while edges represent significant correlations. Colours correspond to MCODE score where red nodes represent higher network density. B Subset of the network showing PROP1 (red node), ESRRG (yellow node) and PPARG (green node) in-silico validated targets (gray nodes). Node shapes (from top): squares in green are genes related to lipids and fatty acid metabolism, triangles in blue are genes related to cell proliferation and apoptosis, rectangles in purple are genes related to the GABA and glutamate pathways and hexagons in red are genes related to nervous system development.
4. Explore biological relevance of the network
Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain
�Build a network from shuffling the original data serves as a control for the methodology.
�It was random, contrasting with the original network.
�It predicted less targets with corresponding binding sites for the key TF.
5. Build a “Control”(random) gene network
23
Contact UsPhone: 1300 363 400 or +61 3 9545 2176
Email: [email protected] Web: www.csiro.au
THE END