Upload
nguyenhanh
View
218
Download
0
Embed Size (px)
Citation preview
Regulatory variants explain much more heritability than coding
variants across 11 common diseases
Alexander Gusev
Harvard School of Public Health
May 8, 2014
Functional annotation of the human genome
“The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists.”
— Mudge et. al. 2013 Genome Research
“Here, we assign biochemical functions for 80% of the genome” — ENCODE Consortium 2012 Nature
Enrichment in functional categories
• Effect-sizes enriched in genic categories (promoter, UTR, exon)1.
• GWAS hits enriched 1.4x in DNaseI Hypersensitivity Sites (DHS)2,3.
• hg2 enriched 1.5x around
genes4.
• Can we precisely quantify this enrichment?
1Schork et al. 2013 PLoS Gen; 2Maurano et al. 2012 Science; 3Trynka et al. 2013 Nat. Gen.; 4Lee et al. 2012 Nat. Gen.
Ecker et. al. 2012 Nature
“hidden” vs. “missing” heritability
h2GWAS ≈ 0.01
h2g = 0.27 (se=0.02)
h2 = 0.81 (CI=0.73-0.90)
Sullivan et al., Arch Gen Psych 2003; Ripke et al. Nat Gen, 2013; Lee et al. Nat Gen, 2012
“hidden” vs. “missing” heritability
GWAS ≈ 0.01
hg2 = 0.27 (se=0.02)
h2 = 0.81 (CI=0.73-0.90)
GWAS 0.01
hg2
0.27 h2
0.81 ← hidden → ← missing →
Other traits: Yang et. al. 2010 Nat Gen; Visscher et. al. 2012 AJHG;
Lee et. al. 2013 Nat Gen
Given a annotations, phenotype is generated from linear combination of Normal genetic effect sizes and some Normal noise/environment
Variance of the phenotype is modeled by multiple kinships of pairwise sample relationships/covariance. Kinships compete for variance.
Each kinship is estimated directly from SNPs in relevant annotation
Heritability (hg2) is the genetic component of V(y)
Estimating hg2
Yang et al. 2010 Nat Genet.
Measuring hg2 enrichment
• Enrichment = (%hgi2) / (%SNPsi)
Note: % SNPs ≈ % bp
• P-value = ZScore[ (%hgi2 - %SNPsi) / (%hgi
2 se) ]
• Extensive simulations:
– Accounting for LD between categories
– Unbiased for uniform causal variants.
– Nearly-unbiased for complex disease architecture (all rare / rare DHS / etc).
%bp
Coding All exons 1%
UTR 5’ and 3’ untranslated regions 1%
Promoter +/- 2kbp of TSS 2%
DHS* Regulatory regions from 217 cell-types 16%
Intron 29%
Intergenic 52%
Broad classes of functional variants
*ENCODE and ROADMAP data analyzed by: H. Xu, C. Zang, Liu Lab; G. Trynka, Raychaudhuri Lab
%bp
Coding All exons 1%
UTR 5’ and 3’ untranslated regions 1%
Promoter +/- 2kbp of TSS 2%
DHS* Regulatory regions from 217 cell-types 16%
Intron 29%
Intergenic 52%
Broad classes of functional variants
*ENCODE and ROADMAP data analyzed by: H. Xu, C. Zang, Liu Lab; G. Trynka, Raychaudhuri Lab
Hierarchical, non-overlapping
Real data: Analysis of 11 complex traits
WTCCC1
2,700 shared controls
1,700 cases each:
• Bipolar disorder
• Coronary artery disease
• Crohn’s disease*
• Hypertension
• Rheumatoid arthritis*
• Type 1 diabetes*
• Type 2 diabates
WTCCC2
5,200 shared controls
1,800-9,300 cases:
• Schizophrenia
• Ankylosing spondylitis*
• Multiple sclerosis*
• Ulcerative colitis*
*HLA excluded from
autoimmune traits
WTCCC, Nature 2007; WTCCC Nature 2011
Meta-analysis: Significant DHS enrichment
Coding
(4.1x)
UTR
(3.5x)
Promoter
(2.2x)
DHS
(1.6x)
Intron
(0.8x)
Intergenic
(0.6x)
Genotyped SNPs
Functional Category
% S
NP
−H
eri
tab
ility
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
2.6e−04 2.2e−031.9e−02
8.0e−06
6.4e−024.1e−13
Mean observed
Expected (% SNPs)
Coding
(13.8x)
UTR
(8.4x)
Promoter
(2.8x)
DHS
(5.1x)
Intron
(0.1x)
Intergenic
(−0.1x)
1000G Imputed SNPs
Functional Category
% S
NP
−H
eri
tab
ility
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
4.7e−04 4.3e−03 1.2e−01
<1e−20
5.5e−12
<1e−20
Mean observed
Expected (% SNPs)
• Very large DHS enrichment: 79% hg
2 vs. 16% of SNPs.
• Significant coding enrichment: 8% hg
2 vs. 1% of SNPs.
• Intron/intergenic not significantly different from zero!
• Enrichment greater in autoimmune traits
Coding
(4.1x)
UTR
(3.5x)
Promoter
(2.2x)
DHS
(1.6x)
Intron
(0.8x)
Intergenic
(0.6x)
Genotyped SNPs
Functional Category
% S
NP
−H
eri
tab
ility
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
2.6e−04 2.2e−031.9e−02
8.0e−06
6.4e−024.1e−13
Mean observed
Expected (% SNPs)
Coding
(13.8x)
UTR
(8.4x)
Promoter
(2.8x)
DHS
(5.1x)
Intron
(0.1x)
Intergenic
(−0.1x)
1000G Imputed SNPs
Functional Category
% S
NP
−H
eri
tab
ility
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
4.7e−04 4.3e−03 1.2e−01
<1e−20
5.5e−12
<1e−20
Mean observed
Expected (% SNPs)
Enrichment greatest in imputed variants
Coding
(4.1x)
UTR
(3.5x)
Promoter
(2.2x)
DHS
(1.6x)
Intron
(0.8x)
Intergenic
(0.6x)
Genotyped SNPs
Functional Category
% S
NP
−H
eri
tab
ility
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
2.6e−04 2.2e−031.9e−02
8.0e−06
6.4e−024.1e−13
Mean observed
Expected (% SNPs)
Coding
(13.8x)
UTR
(8.4x)
Promoter
(2.8x)
DHS
(5.1x)
Intron
(0.1x)
Intergenic
(−0.1x)
1000G Imputed SNPs
Functional Category
% S
NP
−H
eri
tab
ility
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
4.7e−04 4.3e−03 1.2e−01
<1e−20
5.5e−12
<1e−20
Mean observed
Expected (% SNPs)
Consistent with simulations from imputed data
DHS (15.7%) 5.1x
Specific (4.4%) 6.1x (3.2e-03)
other
DHS (15.7%) 5.1x
DGF (8.5%) 5.1x (9.0e-01)
other
Enrichment at functional sub-categories
DHS (15.7%) 5.1x
Enhancer (3.2%) 9.8x (5.1e-04)
other
Computationally inferred enhancers.
DHS peaks appearing in ≤2
cell-types.
Digital genomic foot-printing
(specific cleavage sites).
Hoffman, 2013 Nucl. Ac. Res. Trynka, 2013 Nat. Gen. Neph, 2013 Nature
Phenotype/cell-type specific enrichment
• Enrichment in DHS peaks from specific cell-type versus remaining DHS.
• Test 83 cell-types in autoimmune traits, six significant enrichments relative to DHS:
1Trynka 2013, Nat Gen.; 2Maurano 2012, Science
Cell type Enrichment relative to DHS
PV
Th1 T-Cell 5.8x 1.4 x 10-05 [2]
Fetal kidney 5.4x 4.3 x 10-04
Monocyte CD14+ 4.3x 4.3 x 10-04 [2]
CD8+ primary 4.0x 1.7 x 10-04 [1]
Leukemia 3.5x 5.9 x 10-05
Lymphoblastoid 3.4x 3.1 x 10-05 [2]
Estimates are robust to artifacts
• Disease architecture – Extensive LD/MAF simulations (in real data) – Replication in multiple PGC cohorts
• Estimated variance – Empirical jack-knife estimate is consistent
• Shared controls – Permuted study-wide enrichment minor: λGC = 1.3
• Case/control ascertainment
– Alternative hg2 estimation1, same answer
• Rare variants? 1Golan & Rosset, 2013 arxiv
Rare coding variants do not confound
• Half of h2 still “missing” … can rare variants bias the observed results?
• Data: Exome-chip on 6,500-sample schizophrenia cohort; 64k rare (MAF<0.01) coding variants.
• Contribution from rare coding variants non-significant (hg
2 = 0.037 ± 0.029). • No impact on DHS enrichment.
• Purcell et. al 2014 Nature: contribution from rare
exome-sequence variants modest (0.4%-0.6%).
Enrichment observed in association statistics
A: Stratified QQ plot (height)
B: P-value enrichment (crohn’s disease)
1Schork et al. 2013 PLoS Gen; 2Maurano et al. 2012 Science;
Expected P-value
Ob
serv
ed P
-val
ue
(By
fun
ctio
n)
GWAS P-value threshold En
rich
me
nt
in D
HS
Simulations: Noisy relationship between biology and association statistics
0 2 4 6 8 10
0.0
0.5
1.0
1.5
2.0
2.5
3.0
P−value Enrichment
Minimum −log10(PV)
Fold
enri
chm
ent
●
●
●
●
●
●
Coding
UTR
Promoter
DHS
Intron
Intergenic
Simulation:
• 80% of hg2 from DHS; 10%
from coding; others uniform.
• Generate phenotypes, run 100’s of 32k sample imputed GWAS. Look for enrichment.
• LD & relative enrichment confound estimate GWAS P-value threshold
Enri
chm
ent
100 simulations
Coding Promoter UTR Intron DHS Intergenic
Conclusions
• Partitioning “hidden” heritability informs biology and disease architecture. Implications for GWAS, fine-mapping, risk-prediction.
• Very large enrichment at DHS elements (enhancers, cell-types, etc.) across 11 traits.
• Low upper-bound on non-regulatory intronic/intergenic contribution.
• Non-significant contribution from low-frequency exome-chip variants (in SCZ).
• Larger cohorts will yield trait-specific results.
Acknowledgements
S Hong Lee
Benjamin M Neale
Gosia Trynka
Bjarni Vilhjalmsson
Hilary Finucane
Han Xu
Chongzhi Zang
Stephan Ripke
Eli Stahl
Schizophrenia Working Group of the Psychiatric Genomics
Consortium
SWE-SCZ Consortium
Anna K Kahler Christina M Hultman
Shaun M Purcell Steven A McCarroll
Mark Daly Patrick F Sullivan
Naomi R Wray Soumya Raychaudhuri
Alkes L Price
… and thanks also to: M. Kellis, A. Sarkar, J. Pickrell,
XS. Liu, N. Patterson
Gusev et. al. 2014 bioRxiv