Regulomics I: Methods to read out regulatory functions

Preview:

Citation preview

Regulomics I:Methods to read out regulatory functions

Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010)

Identifying regulatory functions in genomes

forebraingene A

Brain TFs

neural tubegene A

Neural TFs

limb

Limb TFs

gene A

Expression ofgene A

Genes are not just protein coding sequences

gene A

Lettice et al. Hum Mol Genet 12:1725 (2003) Sagai et al. Development 132:797 (2005)

Regulatory mutations can causeprofound phenotypes

Three essential questions

Q1: Where are regulatory elements located in the genome?

Q2: What regulatory functions do they encode?

Q3: What genes do they control?

We will use promoters and enhancers as our examples, but there are other regulatory functions

Q1: Mapping regulatory elements in genomes

Chr5: 133,876,119 – 134,876,119

Genes

Transcription

• Regulatory elements are not easily detected by sequence analysis

• Examine biochemical correlates of RE activity in cells/tissues:• Chromatin Immunoprecipitation (ChIP-seq)• DNase-seq and FAIRE• Methylated DNA immunoprecipitation (MeDIP)

1. TF binding

Biochemical indicators of regulatory function

2. Histonemodification • H3K27ac • H3K4me3

3. Chromatinmodifiers &coactivators

p300 MLL

4. DNA loopingfactors cohesin

MethodsChIP-seq Chromatin accessibility

TFs Histone mods DNase FAIRE

From Furey (2012) Nat Rev Genet 13:840

Method I:ChIP-seq

ChIP

Input

Peak call Signal

Align reads to reference

Use peaks of mapped reads to identify binding events

PCR

ChIP-seq is an enrichment methodRequires a statistical framework for determining the significance of enrichment

ChIP-seq ‘peaks’ are regions of enriched read density relative to an input controlInput = sonicated chromatin collected prior to immunoprecipitation

ChIP

Input

Peak call Enrichment relative to control

Calling peaks in ChIP-seq data

Wilbanks and Facciotti PLoS ONE 5:e11471 (2010)

There are many ChIP-seq peak callers available

From Park (2009) Nat Rev Genet 10:669

Generating ChIP-seq peak profiles

Artifacts:

• Repeats• PCR duplicates

Assessing statistical significance

# of reads at a site (S)

Empirical FDR: Call peaks in input (using ChIP as control)FDR = ratio of # of peaks of given enrichment value called in input vs ChIP

Assume read distribution follows a Poisson distribution

Many sites in input data will have some reads by chance

Some sites will have many reads

From Pepke et al (2009) Nat Meth 6:S22

Assessing statistical significance

# of reads at a site (S)

From Park (2009) Nat Rev Genet 10:669

Sequencing depth matters:

ChIP-seq signal profiles vary depending on factor

Transcriptionfactors

Pol II

Histonemods

From Park (2009) Nat Rev Genet 10:669

DNase I FAIRE

Mapping chromatin accessibility

From Furey (2012) Nat Rev Genet 13:840

Song et al., Genome Res 21:1757 (2011)

DNase I hypersensitivity identifiesregulatory elements…

DNase I hypersensitive sites

…but needs to be combined with other data to determine what is actually bound – such as TF ChIP…

DHS signal in GM12878

RNA PolII ChIPin GM12878

DHS sites in human ES cells:

From Neph (2012) Nature 489:83

… or motif analysis

Q2: Making sense of regulatory functions

Integrate multiple data sources• TF function• Histone modification• Potential target genes• Existing genome annotations

Compare multiple biological states

Regulatory function is dependent on biologicalcontext

forebraingene A

Brain TFs

neural tubegene A

Neural TFs

limb

Limb TFs

gene A

Identifying tissue-specific regulatory function

ChIP-seq signal

Sign

al a

t 20,

000

boun

d si

tes

LimbLimb Brain

Sites strongly marked in Limb

Sites strongly marked in Brain

Clustering

Sites strongly marked

in both

Limb Brain

Function?

Assign enhancers to genes based on proximity (not ideal)

GREAT: bejerano.stanford.edu/great/Gene ontology annotation assigned to regulatory sequences

Identifying tissue-specific regulatory function

Q2: Making sense of regulatory functions

Integrate multiple data sources• TF function• Histone modification• Potential target genes• Existing genome annotations

Compare multiple biological states

Example from PS1: CTCF and RAD21 (cohesin)

CTCF and cohesin co-occupy many sites

Promoters

Insulators

Enhancers

From Kagey et al (2010) Nature 467:430

CTCF: marks insulators and promotersRAD21 (cohesin): marks insulators, promoters and enhancers?Include histone modification data (Wednesday’s lecture)

Promoter Enhancers?

Identifying bound motifs from ChIP-seq data

CTCF

~20,000 binding sites identified by ChIP:

From Furey (2012) Nat Rev Genet 13:840

MEME suite:http://meme.nbcr.net/meme/

Enhancer-associatedhistone modification

Caveat:Single TF binding events often do not indicate regulatory function

• Many TFs are present at high concentrationsin the nucleus

• TF motifs are abundant in the genome

• Single TF binding events may be incidental

Q3: Identifying the target genes forregulatory elements

forebraingene A

Brain TFs

neural tubegene A

Neural TFs

limb

Limb TFs

gene A

Sequence: Hi-C

ChIP for specific factors:ChIA-PET

Sequence: 4C

Chromosome Conformation Capture

Sequence: 5C

3C evaluates specific interaction possibilities by qPCR

Dekker et al Nat Rev Genet 14:390 (2013)

4C identifies genome-wide interactions for a single“bait” sequence

From Kieffer-Kwon et al. (2013) Cell 155:1507

ChIA-PET identifies interactions involving a particular factor

In principle, Hi-C captures all interactions, but islimited by sequencing depth

Dekker et al Nat Rev Genet 14:390 (2013)

Hierarchical organization of the genome

Dekker et al Nat Rev Genet 14:390 (2013)Gorkin et al Cell Stem Cell 14:762 (2014)

Cohesin-mediated interactions

Summary

• Relevant overview papers on methodologies posted on class wiki

• Wednesday: Epigenetics and the histone code

Recommended