23
1 Introduction to Systems Biology 4.8. SNP information in the context of Systems Biology Toni Reverter Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain SNP Information in the Context of Systems Biology

Introduction to Systems Biology - ACTEON

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

1

Introduction to Systems Biology4.8. SNP information in the context of Systems Biology

Toni Reverter

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

2

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

� SNPs are major contributors to genetic variation

� SNPs comprise some 80% of all known polymorphisms

� SNP density ~ 1 per 1,000 base pairs

� SNPs are mostly biallelic � less informative than microsats

� SNPs are more frequent and mutationally more stable

� Two alleles � automated genotyping tools are feasible

SNPs are useful markers!

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

3

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

Fundamental Theorem of the HapMap

Let rAB = correlation between alleles of two SNPs:

SNP 1 � A alternate allele ‘a’SNP 2 � B alternate allele ‘b’

……where B is the functional (associated) SNP

If NBC = sample size required to detect rBC:

Phen � C alternate allele ‘c’

Then, the sample size required to detect rAC:

NAC = NBC (rBC/rAC)2

……impliesrAC = rAB rBC A

B

C

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

The theory which drives the HapMap Project does not hold in general, and in fact can be grossly misleading, most strikingly so when there is substantial LD, the very situation HapMap is designed to model.

Fundamental Theorem of the HapMap

4

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

Only if the genes underlying disease traits have one wild-type and one (or one major) susceptibility allele (ie., when allelic heterogeneity is low) is statistical analysis likely to detect association of the causative allele (or linked markers) with the disease phenotype.

Terwilliger and Hiekkalinna. 2006 . Cont’ed

Case-control (aka. Selective) genotyping, theoretically, can lead to even less power than simple random sampling.

Replicating results is a major difficulty

Variations in allele frequencies among interacting loci can markedly affect the power to detect their effect, ……, which may account for the difficulties in replicating association results.

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Notorieties of SNP data in association studies

• High dimensionality• Large p, Small n � the p is so cheap!

• Imposed categorization• Values are 0, 1 or 2 for as many copies of variant allele

• Raw data is actually a fluorescent signal

• Back to front threshold vs underlying continuity

• Complicated correlation structure among predictors• Linkage

• Linkage disequilibrium

• Large p, Small n

SNP Information in the Context of Systems Biology

5

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

Genetical Genomics (eQTL)

Definition 1: Find DNA variants (eg. SNP) associated with the expression of genes.

Definition 2: Use arrays to identify genes that are DE in relevant tissues of individuals sorted by QTL genotype. If those DE genes map the chromosome region of interest, they would become very strong candidates for QTL.

Jansen and Nap, 2001

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

Note 1: Most significant connections are often to itself (large dark diagonal) � Cis

Note 2: Vertical lines are hot-spots (Major QTL of co-regulated genes) � Trans

DeCook et al. (2006)Genetics 172:1155-1164

Kendziorski & Wang (2006) A review of statistical methods for expression Quantitative Trait Loci mapping. Mammalian Genome 17:509.

Perez-Enciso, Quevedo & Bahamonde (2007) Genetical genomics: use all data. BMC Genomics 8:69.

Genetical Genomics (eQTL)

6

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

Perez-Enciso, Quevedo & Bahamonde (2007) Genetical genomics: use all data. BMC Genomics 8:69.

Genetical Genomics (eQTL)

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

2006

Genetical Genomics (eQTL)

7

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology

2010

Genetical Genomics (eQTL)

8

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology Genetical Genomics (eQTL)

2010

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

SNP Information in the Context of Systems Biology Resources to access the results from whole-

genome association studies are becoming increasingly available

9

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Published Genome-Wide Associations through 3/2009, 398 published GWA at p < 5 x 10-8

NHGRI GWA Catalog

www.genome.gov/GWAStudies…compiled from 331 publications and 1533 SNPs (June 1, 2009)

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Reverter, Chan, Lehnert, et al. (2008) Dissection of beef quality phenotypes using a Myogenin network-anchored systems biology approach . Aust J of Exp Agric, 48:1053-1061.

Fortes, Reverter, Zhang, et al. (2010) An association weight matrix for the genetic dissection of puberty in beef cattle. PNAS (in press)

…Livestock Focussed

Smith, Li, Ingham, Collis, McWilliam, Dixon, Norris, Mortimer, Moore and Reverter (2010) A genomics-informed, SNP association study reveals FBLN1 and FABP4 as contributing to resistanc e to fleece rot in Australian Merino sheep . BMC Veterinary Research 6:27.

SNP Information in the Context of Systems Biology

10

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Relatively inexpensive bioinformatics approaches may partially replace

expensive wet-lab experiments and reduce the amount of genotyping, sequencing,

and phenotyping required

Reduce – Refine – Replace

Publicly available data grossly underutilised

Motivation:

Reverter, Chan, Lehnert, et al. (2008) Dissection of beef quality phenotypes using a Myogenin network-anchored systems biology approach. Aust J of Exp Agric, 48:1053-1061.

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

1,352 Cattle10 Phenotypes

189 Cattle10K SNPs

TRANSCRIPTOMICSTRANSCRIPTOMICSTRANSCRIPTOMICSTRANSCRIPTOMICS

PHENOMICSPHENOMICSPHENOMICSPHENOMICS GENOMICSGENOMICSGENOMICSGENOMICS

MYOG network-anchored systems biology approach

9 Microarray Experiments147 Hybridizations (~9K clones – 822 genes)

47 Conditions (treatments)

LITERATUREBlais et al. 2005. An initial blueprintfor myogenic differentiationGenes & Dev. 19:553

MYOGTargets

Is there a gene expression contrast that can be related with a quantitative

phenotype of routine use in genetic improvement?

QUESTIONQUESTIONQUESTIONQUESTION

11

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Ten Phenotypes

2.0321.192.1821.29LOS

5.2939.845.7240.47LDL

0.494.230.794.20TEN

Meat Quality

2.9411.544.2711.34P8F

2.968.324.318.41RIB

1.504.872.374.89IMF

Fat Depots

22.45294.4950.81291.43CWT

0.201.260.321.27ADG

1.7412.482.0612.32DFI

3.24115.886.72115.61HIP

Growth and Development

SDMeanSDMean

Genotyped (N=189)All (N=1,352)Phenotype

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

WHOLE-GENOME ASSOCIATION � 10K � 651 SNP-QTL(SNPAssoc)

12

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

1. High vs Low quality diets;2. Weight gain vs weight loss treatments;3. Holstein vs Japanese black cattle over a year’s time (3 time points);4. Holstein cattle at the end vs at the beginning of a year’s experiment;5. Japanese black cattle at the end vs at the beginning of a year’s experiment;6. Treated vs untreated adipogenesis stimulant in-vitro cells;7. Piedmontese by Hereford (P×H) vs Wagyu by Hereford (W×H) crosses;8. Late vs early developmental stage in P×H crosses;9. Late vs early developmental stage in W×H crosses;10. Cattle treated vs cattle untreated with Vitamin A;11. Jersey vs Limousin cattle (steers only);12. Jersey vs Limousin cattle (heifers only);13. W×H at 12 months of age vs W×H at 3 months of age;14. P×H at 12 months of age vs P×H at 3 months of age;15. W×H vs P×H from from 3 to 12 months of age (eg. 13. vs 14., above)

9 EXPERIMENTS (147 Hybs)15 EXPRESSION CONTRASTS

Reverter et al. 2006. A gene co-expression network for bovine skeletal muscle inferred from microarray data. Physiological Genomics 28:76-83.

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

( ) ( )21111

20000 ,;,;)( σµφπσµφπ iii dddf +=

( ) )(/,;)( 200000 iii dfdd σµφπτ =0

100

200

300

400

500

600

700

800

900

-1.5 -1 -0.5 0 0.5 1 1.5

DE

Component forDE genes

Component forNon-DE genes

Measure of (possible)Differential Expression

= Mixture of two components

Probability of Not-DE

15 Expression Contrasts

McLachlan et al. 2006. Bioinformatics 22:1608

13

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

LITERATUREBlais et al. 2005. An initial blueprintfor myogenic differentiationGenes & Dev. 19:553

MYOGTargets

22 Validated MYOG-targets were present in our expression

profiling experiments

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

LITERATUREBlais et al. 2005. An initial blueprintfor myogenic differentiationGenes & Dev. 19:553

MYOGTargets

22 Validated MYOG-targets were present in our expression

profiling experiments

14

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Beef CRC Gene Expression

Experiments

Literature on MYOG validated

targets

22 Genes – 35 Connections

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Significance of MYOG-targets on each Expression Contrast(via clones on the array)

Cf. Meta-analysisPyne et al. 2006. Bioinformatics 22:2516

15

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Significance of MYOG-targets on each Phenotype(via SNPs on the 0.5Mb neighbourhood)

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

M � MYOG Target Genes (via clones) × Expression ContrastsS � MYOG Target Genes (via SNPs) × Phenotypes

P = MTS � Expression contrasts × Phenotypes

Two Incidence Matrices

{ } ( )ikiji

jkjk smpp22

1=∏=⇒=P

( ) 244~ln2 =− dfjkp χ

(Fisher’s inverse chi-square method)PNAS, Nov 2005, 102:17296

16

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Two Incidence Matrices

P = MTS � Expression contrasts × Phenotypes

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

1. High vs Low quality diets;2. Weight gain vs weight loss treatments;6. Treated vs untreated adipogenesis stimulant in-vitro cells;

9987842488410474129Tot

10001010111101111LDL

5000000000111110TEN

14212100220200011P8F

8110001111000011RIP

22322101331101022IMF

7011100000101011CWT

20112111000322132ADG

8110001111000011DFI

5000000000011210HIP

151413121110987654321

Tot.Expression ContrastPhen

8. Late vs early developmental stage in P×H crosses;9. Late vs early developmental stage in W×H crosses;

ACTA2, ANKRD1, ATF4, CALM1, CAV3, CRYAB and PDLIM3

17

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

CALM1 CALM1PDLIM3 PDLIM3 PDLIM3

CAST

CALM1 CALM1

CALM1

CALM1

CALM1

PDLIM3

PDLIM3

PDLIM3

PDLIM3

PDLIM3

PDLIM3

PDLIM3

ACTA2

ACTA2

ACTA2

ACTA2

ACTA2 PDLIM3

CRYAB

CRYAB

CRYAB

CRYAB

ATF4

ATF4

ATF4

ATF4

ATF4

CALM1

CALM1

CALM1

ACTA1

ACTA1

ACTA1

ACTA1

ACTA1

DES

DES

DES

DES

DES

ATF4

PDLIM3

PDLIM3

ATF4

CALM1

CALM1

CALM1

CRYAB

ACTA1 ACTA1

PDLIM3

PDLIM3

PDLIM3

CALM1

PDLIM3

CRYAB

CRYAB

PDLIM3PDLIM3

LGALS1

ACTA2

ACTA2

CRYAB

ACTA1

PDLIM3

PDLIM3

PDLIM3

PDLIM3

MYH7

ACTA1

PDLIM3

CRYAB

CRYAB

CAST

ATF4

ATF4

CRYAB

ACTA1

ATF4

CALM1

CALM1

CALM1

CRYAB

CRYAB

DES

DES

DES

DES

ACTA1

CRYAB

CRYAB

ACTA1

ACTA1

ACTA1

LGALS1

MYOM2

CALM1

CALM1

MEF2C MEF2C MEF2C MEF2C MEF2C MEF2C MEF2C

CAV3 CAV3 CAV3 CAV3 CAV3

MYOM2 MYOM2 MYOM2 MYOM2

PDLIM3 PDLIM3 PDLIM3

PDLIM3 PDLIM3

PDLIM3

PDLIM3

LGALS1

ACTA2

ACTA2

ACTA2

MEF2C

CRYAB

CRYAB

CRYAB

CRYAB PDLIM3

MEF2C

ATF4

ATF4

ATF4

ATF4

CALM1

CALM1

DES

DES

DES

DES

ACTA1

ACTA1

Genes contributing the most in the Expression by Phenotype

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Beef CRC Gene Expression

Experiments

Literature on MYOG validated

targets

Beef CRCQuantitative Genetics

(~1,300 cattle)

PRIMARYINPUTS

MYOG targets in Beef CRC experiments

Beef CRCWhole-Genome Scan

(~200 cattle)

SECONDARYINPUTS

Gene Network forMYOG targets

DifferentialExpression forMYOG targets

SNP-QTLs forMYOG targets

PRIMARYOUTPUTS

Expression Contrasts Related with PhenotypeSECONDARY

OUTPUTS

Reverter, Chan, Lehnert, et al. (2008) Dissection of beef quality phenotypes using a Myogenin network-anchored systems biology approach. Aust J of Exp Agric, 48:1053-1061.

18

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

10 Genes to PCR 16 SNPs in 5 Genes

N Genes: ~4,000 Chip � 3,200 Analysis � 155 Diff.Expr � 10 PCR � 5 for SNPs (16) � 2

19

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

MATERIAL

� 866 Cows� 51 Sire families� 22 Phenotypes� 50K SNPs

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

CHIP: 50,070 SNPs

SNP p<0.05 in 10 or more phenotypes correlated to AGECL (abs correl >0.28)

Yes

No

Selected Top 89 SNPs

Close Far Very Far Unmapped Close Far Very Far Unmapped

13 56 none 2

1 gene = 1 SNP

28633 384 1,44419556

1 gene = 1 SNP P<0.05 in AGECL or 3 other phenotypes

No Yes

Selected SNPs

AWM SNPs

3037 Close; 64 Very Far; 56 Far; 2 Unmapped

CHIP: 50,070 SNPs

SNP p<0.05 in 10 or more phenotypes correlated to AGECL (abs correl >0.28)

Yes

No

Selected Top 89 SNPs

Close Far Very Far Unmapped Close Far Very Far UnmappedClose Far Very Far Unmapped

13 56 none 213 56 none 2

1 gene = 1 SNP1 gene = 1 SNP

28633 384 1,44419556

1 gene = 1 SNP1 gene = 1 SNP P<0.05 in AGECL or 3 other phenotypes

No Yes

Selected SNPs

AWM SNPs

3037 Close; 64 Very Far; 56 Far; 2 Unmapped

1. Select SNPs from a genome wide association

20

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

2. The Association Weight Matrix

� 3,159 Genes as rows (via neighbouring SNP)

� 22 Traits as columns� Each cell {i,j} value is

the normalized additive effect of ith-SNP on the jth-trait.

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

2. Use AWM columns to build a phenotype network (PCIT)

21

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

3. Use AWM rows to build a gene network (PCIT)

� MCODE: network density analysis� DAVID: discover gene pathways

� GOrilla: discover gene ontology enrichment.

� RIF: discover key transcription factors

� Genomatix: find corresponding binding sites for predicted targets of key TF.

22

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

Figure. Puberty network extracted from the AWM. A Entire network: Nodes represent 3,159 genes and SNPs while edges represent significant correlations. Colours correspond to MCODE score where red nodes represent higher network density. B Subset of the network showing PROP1 (red node), ESRRG (yellow node) and PPARG (green node) in-silico validated targets (gray nodes). Node shapes (from top): squares in green are genes related to lipids and fatty acid metabolism, triangles in blue are genes related to cell proliferation and apoptosis, rectangles in purple are genes related to the GABA and glutamate pathways and hexagons in red are genes related to nervous system development.

4. Explore biological relevance of the network

Toni Reverter. Introduction to Systems Biology. 7 – 11 June, Valencia, Spain

�Build a network from shuffling the original data serves as a control for the methodology.

�It was random, contrasting with the original network.

�It predicted less targets with corresponding binding sites for the key TF.

5. Build a “Control”(random) gene network

23

Contact UsPhone: 1300 363 400 or +61 3 9545 2176

Email: [email protected] Web: www.csiro.au

THE END