19
Comparative eQTL analyses within and between seven tissue types suggest mechanisms underlying cell type specificity of eQTLs Barbara Engelhardt, Duke University Christopher D Brown, University of Pennsylvania November 9th, 2012 Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Comparative eQTL analyses within and betweenseven tissue types suggest mechanisms underlying

cell type specificity of eQTLs

Barbara Engelhardt, Duke UniversityChristopher D Brown, University of Pennsylvania

November 9th, 2012

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 2: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Motivation: Predicting functional SNPs

Most functional nucleotides in vertebrate genomes arenon-coding

> 85% of common disease associations with non-coding SNPs

We would like to know whether any non-coding SNP in celltype of interest is biochemically functional to study:

genome-wide association study hitsde novo mutations involved in highly penetrant diseasesomatic mutations involved in cancer

Current functional SNP analyses are limited by our narrowunderstanding of the functional constraints of most of thegenome

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 3: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Functional SNPs: Expression Quantitative Trait Loci

eQTLs are genetic variants that are associated withdifferences in mRNA transcription levels

Current eQTL studies do not go far enough:

cell specificity across relevant cell types unclearLD-linked SNPs instead of causal SNPoften one local, most significantly associated eQTL-SNP

Study goal: quantify, identify possible mechanisms for, andpredict cell type specific eQTLs

Results will enable functional interpretations of SNPs in a celltype specific way

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 4: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Comparison of eQTLs: eleven studies, seven cell types

Used gene expression and genotype data from 11 publiclyavailable studies on 7 different cell types

Analysis pipeline was uniform across studies:

Remapped expression probes to unique genes in EnsemblRemoved unexpressed probes, probes containing SNPsRemoved principal components to account for study-specificconfoundersImputed genotypes to CEPH HapMap phase 2 panel

Evaluated eQTLs using Bayes factors (BFs)

Single permutation to evaluate FDR

Only considered cis-eQTLs (SNPs within 1Mb of TSS or TES)

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 5: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

eQTLs across studies: by the numbers

Study Code Tissue N N genesCAP CPL LCLs 480 18718

HapMap 2 STL LCLs 210 15752

Harvard HCE Cerebellum 540 18263

Harvard HPC Prefrontal cortex 678 18257

Harvard HVC Visual cortex 463 18263

GenCord GCF Blood fibroblasts 83 16691

GenCord GCL LCLs 85 16691

GenCord GCT Blood t cells 85 16691

UChicago CLI Liver 206 16236

Merck MLI Liver 266 18234

Myers MBR Brain 193 11707

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 6: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Sample size versus fraction of genes with eQTLs

10

10

10

1 2 3 4 5 6

10

10

10

10

10

0 50 100 150

GCFGCTGCLMBR

CLISTLMLI

HVCCPLHCEHPC

5

10

15

20

25

30

5

10

15

20

10

10

10

10

5% FDR

1 2 3 4 5 6 70

200 400 600200 400 600 -40 -20

TSS

TES 20 40KbSamplesSamples

eQTL Count (x10 )log BF log BF

FDR

eQTL

Cou

nt

Stud

y

Gen

es w

ith e

QTL

[%]

eQTL

s with

AH

[%]

eQTL

Cou

nt

3’

4’

1’2’

P

3’ 4’1’ 2’

CLI

STL

MLI

HVC

HPCHCE CPL

MBRG

C*

CLICLI

STL

STL

MBRMBR

MLIMLI

HVC HVC

HPC HPC

HCE HCE

GC*

GC*

CPL CPL

Studies with duplicate arrays have substantially more power

Study size and replicate arrays account for 98% of thevariability in fraction of genes with eQTLs

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 7: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Allelic heterogeneity

Allelic heterogeneity: variants at a genomic locusindependently regulate the same biological process.

ENCODE: > 400, 000 regulatory elements for ∼ 23, 000 genes

Most significant eQTL is often not the only eQTL

Used LD-block method to identify allelic heterogeneity

Followed identification with a test for independent effects

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 8: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Allelic heterogeneity across eleven studies

10

10

10

1 2 3 4 5 6

10

10

10

10

10

0 50 100 150

GCFGCTGCLMBR

CLISTLMLI

HVCCPLHCEHPC

5

10

15

20

25

30

5

10

15

20

10

10

10

10

5% FDR

1 2 3 4 5 6 70

200 400 600200 400 600 -40 -20

TSS

TES 20 40KbSamplesSamples

eQTL Count (x10 )log BF log BF

FDR

eQTL

Cou

nt

Stud

y

Gen

es w

ith e

QTL

[%]

eQTL

s with

AH

[%]

eQTL

Cou

nt

3’

4’

1’2’

P

3’ 4’1’ 2’

CLI

STL

MLI

HVC

HPCHCE CPL

MBRG

C*

CLICLI

STL

STL

MBRMBR

MLIMLI

HVC HVC

HPC HPC

HCE HCE

GC*

GC*

CPL CPL

Sample size well correlated withlevels of allelic heterogeneity

Gene Ontology analysis showsno distinction between geneswith primary eQTLs and thosewith secondary or more eQTLs

We hypothesize that allelicheterogeneity is ubiquitous

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 9: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

eQTLs across cell types: locations

eQTLs enriched relative tobackground at TSS, TES

TSS, TES enrichment extendsto eQTLs in all tiers

10

10

10

1 2 3 4 5 6

10

10

10

10

10

0 50 100 150

GCFGCTGCLMBR

CLISTLMLI

HVCCPLHCEHPC

5

10

15

20

25

30

5

10

15

20

10

10

10

10

5% FDR

1 2 3 4 5 6 70

200 400 600200 400 600 -40 -20

TSS

TES 20 40KbSamplesSamples

eQTL Count (x10 )log BF log BF

FDR

eQTL

Cou

nt

Stud

y

Gen

es w

ith e

QTL

[%]

eQTL

s with

AH

[%]

eQTL

Cou

nt

3’

4’

1’2’

P

3’ 4’1’ 2’

CLI

STL

MLI

HVC

HPCHCE CPL

MBRG

C*

CLICLI

STL

STL

MBRMBR

MLIMLI

HVC HVC

HPC HPC

HCE HCE

GC*

GC*

CPL CPL

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 10: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Replication within and between cell types

eQTL replication entails log10 BF > 1.0 in target data set forall eQTLs in discovery data at FDR < 5%

Blue lines show within cell type replication; red lines showbetween cell type replication

0

25

50

75

100

0 20 40 600 5 10 15 20 0.0 2.5 5.0 7.5 10.0 12.5

Repl

icat

ion

[%]

0

25

50

75

50 100 150 200 50 100 150 200 50 100 150 200|SNP - TSS| [Kb]

Repl

icat

ion

[%]

log BF

LCL + LCLLCL + Liver

Liver + LiverLiver + LCL

Brain + BrainBrain + LCL

LCL + LCLLCL + Liver

Liver + LiverLiver + LCL

Brain + BrainBrain + LCL

False positives: small percentage of replicating eQTLs

False negatives: due to study design, lack of power, etc.

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 11: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Incorporating ENCODE data: functional interpretability

ENCODE project has extensive genomic data for cell typespecific genomic featuresUnderstand how eQTL regulates transcription

Figure from ENCODE projectBarbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 12: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Allelic heterogeneity and insulators

CTCF is the best characterizedinsulator protein, conserved infunction across metazoans

If two SNPs independentlyregulate transcription, we mightexpect an enrichment of CTCFbetween them

In Drosophila melanogaster,recent work showed insulatorsare enriched between alternativepromoters [Negre, 2010]

We see this same enrichment inhumans

23

20

40

60

80

100

0 50 100 150 200 250 300SNP-SNP Distance [kb]

Inte

rven

ing

CTC

F [%

]

Background SNPsIndependent eQTL SNPs

Figure 3. Insulators are enriched between SNPs independently associated with the samegene expression trait.Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 13: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

eQTLs and overlap with DHS sites

DNAse I hypersensitive (DHS)sites: indicate histone-depletedopen chromatin; classic feature ofactive regulatory elements

Clear enrichment in eQTL overlap

Significant enrichment forreplicating eQTLs versusnon-replicating eQTLs (not shown)

Significant enrichment for LCLeQTLs in DHS sites in LCLs versusDHS sites in Hepg2 cells (notshown)

10

20

30

40

0

10

20

30

0

10

20

SN

P-CR

E O

verla

p [%

] S

NP-

CRE

Ove

rlap

[%]

SN

P-CR

E O

verla

p [%

]

DHS

Site

sp3

00 S

ites

Activ

e Pr

omot

ers

-50 -25

TSS

TES 25 50 -50 -25

TSS

TES 25 50-50 -25

TSS

TES 25 50

Kb KbKb

Background SNPseQTL SNPs

Non-replicatingReplicating

HepG2 CREsLCL CREs

Background SNPseQTL SNPs

Non-replicatingReplicating

HepG2 CREsLCL CREs

Background SNPseQTL SNPs

Non-replicatingReplicating

HepG2 CREsLCL CREs

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 14: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

eQTLs and overlap with heterochromatin

Heterochromatin (facultative):tightly packed, cell specific form ofDNA; regulatory elements inheterochromatin regions areinaccessible to transcriptionalregulators

Clear depletion in eQTL overlap

Significant depletion for replicatingeQTLs versus non-replicatingeQTLs (not shown)

Significant depletion for LCLeQTLs in heterochromatin in LCLsversus heterochromatin in Hepg2cells (not shown)

20

40

60

0

5

10

15

20

25

0

20

40

60

80

-50 -25

TSS

TES 25 50 -50 -25

TSS

TES 25 50 -50 -25

TSS

TES 25 50

SN

P-CR

E O

verla

p [%

] S

NP-

CRE

Ove

rlap

[%]

Insu

lato

r In

ters

ectio

n [%

]

Hete

roch

rom

atin

Repr

esse

d Ch

rom

atin

Insu

lato

rs

Kb Kb Kb

Background SNPseQTL SNPs

Non-replicatingReplicating

HepG2 CREsLCL CREs

Background SNPseQTL SNPs

Non-replicatingReplicating

HepG2 CREsLCL CREs

Background SNPseQTL SNPs

Non-replicatingReplicating

HepG2 CREsLCL CREs

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 15: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Predicting replication of eQTLs

Built random forest classifier to predict whether a specificeQTL would replication in a second study

Class was whether an eQTL replicated or not

Features included:

genomic information (e.g., distance to TSS of SNP)non-cell type specific regulatory elements (e.g., GERP scores)cell type specific regulatory elements (e.g., DHS sites, TFBS)

Considered predicting replication:

within cell type using cell type specific CRE informationbetween cell type using target cell type specific CRE data

Validated accuracy using 10-fold cross validation

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 16: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Predicting replication of eQTLs: ROC curves

Receiver Operating Characteristic (ROC) curves compare the rateof false positives versus the rate of true positives as the cutoffmoves from most to least restrictiveRed lines: within cell type replicability; blue lines: between celltype replicability.

0

25

50

75

100

0 25 50 75 1000 25 50 75 100 0 25 50 75 100FPR

TPR

LCL + LCLLCL + Liver

Liver + LiverLiver + LCL

Brain + BrainBrain + LCL

Area under the ROC Curve (AUC): quantifies improvementover random guessing

For LCL eQTLs, AUCs are 0.79 and 0.73, respectively, forwithin LCL and between LCL and liver eQTL replication

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 17: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Predicting replication of eQTLs: Gini scores

How predictive is each feature for whether the eQTLreplicates?

Across all training sets, biggest contributors:

eQTL discovery significanceSNP to TSS distance,gene expression level

Cis-regulatory elements vary considerably in the degree towhich they are useful in predicting replication

Intervening insulators contribute substantially to within celltype predictions

Heterochromatin states contribute substantially to betweencell type predictions

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 18: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Summary and Conclusions

We leveraged eQTLs found in both within and between celltypes and extensive ENCODE data in this large comparativestudy to quantify, describe mechanistically, and predict celltype specific eQTL SNPs

With an SNP and a cell type of interest:

identify an eQTL well correlated (in high LD) with the hitcompute probability that it will replicate in cell type of interestconsider the location of the hit relative to cell type specific andprediction-informative CREsmake a more informed hypothesis about mechanism ofphenotype (validate via experiments)

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity

Page 19: Comparative eQTL analyses within and between seven tissue …bee/pubs/ashg-bee-2012.pdf · 2013. 5. 2. · Casey Brown (UChicago, Penn), Lara Mangravite (Sage Bionetworks), Matthew

Acknowledgements

Casey Brown (UChicago, Penn), Lara Mangravite (SageBionetworks), Matthew Stephens (University of Chicago)

Greg Crawford (Duke University), all the ENCODE data

eQTL studies: GenCord, CAP, Harvard Brain, HapMap phase2, Merck liver, Myers brain, UChicago liver

Funding: NIH NHGRI K99/R00

Paper on arXiv, Haldane’s Sieve

Graphics: R package ggplot2

Barbara Engelhardt Mechanisms underlying eQTL cell type specificity