Upload
mohamed-whitfill
View
219
Download
0
Embed Size (px)
Citation preview
Regulomics I:Methods to read out regulatory functions
Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010)
Identifying regulatory functions in genomes
forebraingene A
Brain TFs
neural tubegene A
Neural TFs
limb
Limb TFs
gene A
Expression ofgene A
Genes are not just protein coding sequences
gene A
Lettice et al. Hum Mol Genet 12:1725 (2003) Sagai et al. Development 132:797 (2005)
Regulatory mutations can causeprofound phenotypes
Three essential questions
Q1: Where are regulatory elements located in the genome?
Q2: What regulatory functions do they encode?
Q3: What genes do they control?
We will use promoters and enhancers as our examples, but there are other regulatory functions
Q1: Mapping regulatory elements in genomes
Chr5: 133,876,119 – 134,876,119
Genes
Transcription
• Regulatory elements are not easily detected by sequence analysis
• Examine biochemical correlates of RE activity in cells/tissues:• Chromatin Immunoprecipitation (ChIP-seq)• DNase-seq and FAIRE• Methylated DNA immunoprecipitation (MeDIP)
1. TF binding
Biochemical indicators of regulatory function
2. Histonemodification • H3K27ac • H3K4me3
3. Chromatinmodifiers &coactivators
p300 MLL
4. DNA loopingfactors cohesin
MethodsChIP-seq Chromatin accessibility
TFs Histone mods DNase FAIRE
From Furey (2012) Nat Rev Genet 13:840
Method I:ChIP-seq
ChIP
Input
Peak call Signal
Align reads to reference
Use peaks of mapped reads to identify binding events
PCR
ChIP-seq is an enrichment methodRequires a statistical framework for determining the significance of enrichment
ChIP-seq ‘peaks’ are regions of enriched read density relative to an input controlInput = sonicated chromatin collected prior to immunoprecipitation
ChIP
Input
Peak call Enrichment relative to control
Calling peaks in ChIP-seq data
Wilbanks and Facciotti PLoS ONE 5:e11471 (2010)
There are many ChIP-seq peak callers available
From Park (2009) Nat Rev Genet 10:669
Generating ChIP-seq peak profiles
Artifacts:
• Repeats• PCR duplicates
Assessing statistical significance
# of reads at a site (S)
Empirical FDR: Call peaks in input (using ChIP as control)FDR = ratio of # of peaks of given enrichment value called in input vs ChIP
Assume read distribution follows a Poisson distribution
Many sites in input data will have some reads by chance
Some sites will have many reads
From Pepke et al (2009) Nat Meth 6:S22
Assessing statistical significance
# of reads at a site (S)
From Park (2009) Nat Rev Genet 10:669
Sequencing depth matters:
ChIP-seq signal profiles vary depending on factor
Transcriptionfactors
Pol II
Histonemods
From Park (2009) Nat Rev Genet 10:669
DNase I FAIRE
Mapping chromatin accessibility
From Furey (2012) Nat Rev Genet 13:840
Song et al., Genome Res 21:1757 (2011)
DNase I hypersensitivity identifiesregulatory elements…
DNase I hypersensitive sites
…but needs to be combined with other data to determine what is actually bound – such as TF ChIP…
DHS signal in GM12878
RNA PolII ChIPin GM12878
DHS sites in human ES cells:
From Neph (2012) Nature 489:83
… or motif analysis
Q2: Making sense of regulatory functions
Integrate multiple data sources• TF function• Histone modification• Potential target genes• Existing genome annotations
Compare multiple biological states
Regulatory function is dependent on biologicalcontext
forebraingene A
Brain TFs
neural tubegene A
Neural TFs
limb
Limb TFs
gene A
Identifying tissue-specific regulatory function
ChIP-seq signal
Sign
al a
t 20,
000
boun
d si
tes
LimbLimb Brain
Sites strongly marked in Limb
Sites strongly marked in Brain
Clustering
Sites strongly marked
in both
Limb Brain
Function?
Assign enhancers to genes based on proximity (not ideal)
GREAT: bejerano.stanford.edu/great/Gene ontology annotation assigned to regulatory sequences
Identifying tissue-specific regulatory function
Q2: Making sense of regulatory functions
Integrate multiple data sources• TF function• Histone modification• Potential target genes• Existing genome annotations
Compare multiple biological states
Example from PS1: CTCF and RAD21 (cohesin)
CTCF and cohesin co-occupy many sites
Promoters
Insulators
Enhancers
From Kagey et al (2010) Nature 467:430
CTCF: marks insulators and promotersRAD21 (cohesin): marks insulators, promoters and enhancers?Include histone modification data (Wednesday’s lecture)
Promoter Enhancers?
Identifying bound motifs from ChIP-seq data
CTCF
~20,000 binding sites identified by ChIP:
From Furey (2012) Nat Rev Genet 13:840
MEME suite:http://meme.nbcr.net/meme/
Enhancer-associatedhistone modification
Caveat:Single TF binding events often do not indicate regulatory function
• Many TFs are present at high concentrationsin the nucleus
• TF motifs are abundant in the genome
• Single TF binding events may be incidental
Q3: Identifying the target genes forregulatory elements
forebraingene A
Brain TFs
neural tubegene A
Neural TFs
limb
Limb TFs
gene A
Sequence: Hi-C
ChIP for specific factors:ChIA-PET
Sequence: 4C
Chromosome Conformation Capture
Sequence: 5C
3C evaluates specific interaction possibilities by qPCR
Dekker et al Nat Rev Genet 14:390 (2013)
4C identifies genome-wide interactions for a single“bait” sequence
From Kieffer-Kwon et al. (2013) Cell 155:1507
ChIA-PET identifies interactions involving a particular factor
In principle, Hi-C captures all interactions, but islimited by sequencing depth
Dekker et al Nat Rev Genet 14:390 (2013)
Hierarchical organization of the genome
Dekker et al Nat Rev Genet 14:390 (2013)Gorkin et al Cell Stem Cell 14:762 (2014)
Cohesin-mediated interactions
Summary
• Relevant overview papers on methodologies posted on class wiki
• Wednesday: Epigenetics and the histone code