20
Regulatory variants explain much more heritability than coding variants across 11 common diseases Alexander Gusev Harvard School of Public Health May 8, 2014

Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Embed Size (px)

Citation preview

Page 1: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Regulatory variants explain much more heritability than coding

variants across 11 common diseases

Alexander Gusev

Harvard School of Public Health

May 8, 2014

Page 2: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Functional annotation of the human genome

“The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists.”

— Mudge et. al. 2013 Genome Research

“Here, we assign biochemical functions for 80% of the genome” — ENCODE Consortium 2012 Nature

Page 3: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Enrichment in functional categories

• Effect-sizes enriched in genic categories (promoter, UTR, exon)1.

• GWAS hits enriched 1.4x in DNaseI Hypersensitivity Sites (DHS)2,3.

• hg2 enriched 1.5x around

genes4.

• Can we precisely quantify this enrichment?

1Schork et al. 2013 PLoS Gen; 2Maurano et al. 2012 Science; 3Trynka et al. 2013 Nat. Gen.; 4Lee et al. 2012 Nat. Gen.

Ecker et. al. 2012 Nature

Page 4: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

“hidden” vs. “missing” heritability

h2GWAS ≈ 0.01

h2g = 0.27 (se=0.02)

h2 = 0.81 (CI=0.73-0.90)

Sullivan et al., Arch Gen Psych 2003; Ripke et al. Nat Gen, 2013; Lee et al. Nat Gen, 2012

Page 5: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

“hidden” vs. “missing” heritability

GWAS ≈ 0.01

hg2 = 0.27 (se=0.02)

h2 = 0.81 (CI=0.73-0.90)

GWAS 0.01

hg2

0.27 h2

0.81 ← hidden → ← missing →

Other traits: Yang et. al. 2010 Nat Gen; Visscher et. al. 2012 AJHG;

Lee et. al. 2013 Nat Gen

Page 6: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Given a annotations, phenotype is generated from linear combination of Normal genetic effect sizes and some Normal noise/environment

Variance of the phenotype is modeled by multiple kinships of pairwise sample relationships/covariance. Kinships compete for variance.

Each kinship is estimated directly from SNPs in relevant annotation

Heritability (hg2) is the genetic component of V(y)

Estimating hg2

Yang et al. 2010 Nat Genet.

Page 7: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Measuring hg2 enrichment

• Enrichment = (%hgi2) / (%SNPsi)

Note: % SNPs ≈ % bp

• P-value = ZScore[ (%hgi2 - %SNPsi) / (%hgi

2 se) ]

• Extensive simulations:

– Accounting for LD between categories

– Unbiased for uniform causal variants.

– Nearly-unbiased for complex disease architecture (all rare / rare DHS / etc).

Page 8: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

%bp

Coding All exons 1%

UTR 5’ and 3’ untranslated regions 1%

Promoter +/- 2kbp of TSS 2%

DHS* Regulatory regions from 217 cell-types 16%

Intron 29%

Intergenic 52%

Broad classes of functional variants

*ENCODE and ROADMAP data analyzed by: H. Xu, C. Zang, Liu Lab; G. Trynka, Raychaudhuri Lab

Page 9: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

%bp

Coding All exons 1%

UTR 5’ and 3’ untranslated regions 1%

Promoter +/- 2kbp of TSS 2%

DHS* Regulatory regions from 217 cell-types 16%

Intron 29%

Intergenic 52%

Broad classes of functional variants

*ENCODE and ROADMAP data analyzed by: H. Xu, C. Zang, Liu Lab; G. Trynka, Raychaudhuri Lab

Hierarchical, non-overlapping

Page 10: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Real data: Analysis of 11 complex traits

WTCCC1

2,700 shared controls

1,700 cases each:

• Bipolar disorder

• Coronary artery disease

• Crohn’s disease*

• Hypertension

• Rheumatoid arthritis*

• Type 1 diabetes*

• Type 2 diabates

WTCCC2

5,200 shared controls

1,800-9,300 cases:

• Schizophrenia

• Ankylosing spondylitis*

• Multiple sclerosis*

• Ulcerative colitis*

*HLA excluded from

autoimmune traits

WTCCC, Nature 2007; WTCCC Nature 2011

Page 11: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Meta-analysis: Significant DHS enrichment

Coding

(4.1x)

UTR

(3.5x)

Promoter

(2.2x)

DHS

(1.6x)

Intron

(0.8x)

Intergenic

(0.6x)

Genotyped SNPs

Functional Category

% S

NP

−H

eri

tab

ility

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

2.6e−04 2.2e−031.9e−02

8.0e−06

6.4e−024.1e−13

Mean observed

Expected (% SNPs)

Coding

(13.8x)

UTR

(8.4x)

Promoter

(2.8x)

DHS

(5.1x)

Intron

(0.1x)

Intergenic

(−0.1x)

1000G Imputed SNPs

Functional Category

% S

NP

−H

eri

tab

ility

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

4.7e−04 4.3e−03 1.2e−01

<1e−20

5.5e−12

<1e−20

Mean observed

Expected (% SNPs)

• Very large DHS enrichment: 79% hg

2 vs. 16% of SNPs.

• Significant coding enrichment: 8% hg

2 vs. 1% of SNPs.

• Intron/intergenic not significantly different from zero!

• Enrichment greater in autoimmune traits

Page 12: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Coding

(4.1x)

UTR

(3.5x)

Promoter

(2.2x)

DHS

(1.6x)

Intron

(0.8x)

Intergenic

(0.6x)

Genotyped SNPs

Functional Category

% S

NP

−H

eri

tab

ility

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

2.6e−04 2.2e−031.9e−02

8.0e−06

6.4e−024.1e−13

Mean observed

Expected (% SNPs)

Coding

(13.8x)

UTR

(8.4x)

Promoter

(2.8x)

DHS

(5.1x)

Intron

(0.1x)

Intergenic

(−0.1x)

1000G Imputed SNPs

Functional Category

% S

NP

−H

eri

tab

ility

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

4.7e−04 4.3e−03 1.2e−01

<1e−20

5.5e−12

<1e−20

Mean observed

Expected (% SNPs)

Enrichment greatest in imputed variants

Coding

(4.1x)

UTR

(3.5x)

Promoter

(2.2x)

DHS

(1.6x)

Intron

(0.8x)

Intergenic

(0.6x)

Genotyped SNPs

Functional Category

% S

NP

−H

eri

tab

ility

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

2.6e−04 2.2e−031.9e−02

8.0e−06

6.4e−024.1e−13

Mean observed

Expected (% SNPs)

Coding

(13.8x)

UTR

(8.4x)

Promoter

(2.8x)

DHS

(5.1x)

Intron

(0.1x)

Intergenic

(−0.1x)

1000G Imputed SNPs

Functional Category

% S

NP

−H

eri

tab

ility

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

4.7e−04 4.3e−03 1.2e−01

<1e−20

5.5e−12

<1e−20

Mean observed

Expected (% SNPs)

Consistent with simulations from imputed data

Page 13: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

DHS (15.7%) 5.1x

Specific (4.4%) 6.1x (3.2e-03)

other

DHS (15.7%) 5.1x

DGF (8.5%) 5.1x (9.0e-01)

other

Enrichment at functional sub-categories

DHS (15.7%) 5.1x

Enhancer (3.2%) 9.8x (5.1e-04)

other

Computationally inferred enhancers.

DHS peaks appearing in ≤2

cell-types.

Digital genomic foot-printing

(specific cleavage sites).

Hoffman, 2013 Nucl. Ac. Res. Trynka, 2013 Nat. Gen. Neph, 2013 Nature

Page 14: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Phenotype/cell-type specific enrichment

• Enrichment in DHS peaks from specific cell-type versus remaining DHS.

• Test 83 cell-types in autoimmune traits, six significant enrichments relative to DHS:

1Trynka 2013, Nat Gen.; 2Maurano 2012, Science

Cell type Enrichment relative to DHS

PV

Th1 T-Cell 5.8x 1.4 x 10-05 [2]

Fetal kidney 5.4x 4.3 x 10-04

Monocyte CD14+ 4.3x 4.3 x 10-04 [2]

CD8+ primary 4.0x 1.7 x 10-04 [1]

Leukemia 3.5x 5.9 x 10-05

Lymphoblastoid 3.4x 3.1 x 10-05 [2]

Page 15: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Estimates are robust to artifacts

• Disease architecture – Extensive LD/MAF simulations (in real data) – Replication in multiple PGC cohorts

• Estimated variance – Empirical jack-knife estimate is consistent

• Shared controls – Permuted study-wide enrichment minor: λGC = 1.3

• Case/control ascertainment

– Alternative hg2 estimation1, same answer

• Rare variants? 1Golan & Rosset, 2013 arxiv

Page 16: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Rare coding variants do not confound

• Half of h2 still “missing” … can rare variants bias the observed results?

• Data: Exome-chip on 6,500-sample schizophrenia cohort; 64k rare (MAF<0.01) coding variants.

• Contribution from rare coding variants non-significant (hg

2 = 0.037 ± 0.029). • No impact on DHS enrichment.

• Purcell et. al 2014 Nature: contribution from rare

exome-sequence variants modest (0.4%-0.6%).

Page 17: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Enrichment observed in association statistics

A: Stratified QQ plot (height)

B: P-value enrichment (crohn’s disease)

1Schork et al. 2013 PLoS Gen; 2Maurano et al. 2012 Science;

Expected P-value

Ob

serv

ed P

-val

ue

(By

fun

ctio

n)

GWAS P-value threshold En

rich

me

nt

in D

HS

Page 18: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Simulations: Noisy relationship between biology and association statistics

0 2 4 6 8 10

0.0

0.5

1.0

1.5

2.0

2.5

3.0

P−value Enrichment

Minimum −log10(PV)

Fold

enri

chm

ent

Coding

UTR

Promoter

DHS

Intron

Intergenic

Simulation:

• 80% of hg2 from DHS; 10%

from coding; others uniform.

• Generate phenotypes, run 100’s of 32k sample imputed GWAS. Look for enrichment.

• LD & relative enrichment confound estimate GWAS P-value threshold

Enri

chm

ent

100 simulations

Coding Promoter UTR Intron DHS Intergenic

Page 19: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Conclusions

• Partitioning “hidden” heritability informs biology and disease architecture. Implications for GWAS, fine-mapping, risk-prediction.

• Very large enrichment at DHS elements (enhancers, cell-types, etc.) across 11 traits.

• Low upper-bound on non-regulatory intronic/intergenic contribution.

• Non-significant contribution from low-frequency exome-chip variants (in SCZ).

• Larger cohorts will yield trait-specific results.

Page 20: Regulatory variants explain much more heritability … · more heritability than coding variants across 11 common diseases ... •Case/control ascertainment ... PowerPoint Presentation

Acknowledgements

S Hong Lee

Benjamin M Neale

Gosia Trynka

Bjarni Vilhjalmsson

Hilary Finucane

Han Xu

Chongzhi Zang

Stephan Ripke

Eli Stahl

Schizophrenia Working Group of the Psychiatric Genomics

Consortium

SWE-SCZ Consortium

Anna K Kahler Christina M Hultman

Shaun M Purcell Steven A McCarroll

Mark Daly Patrick F Sullivan

Naomi R Wray Soumya Raychaudhuri

Alkes L Price

… and thanks also to: M. Kellis, A. Sarkar, J. Pickrell,

XS. Liu, N. Patterson

Gusev et. al. 2014 bioRxiv