Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Review by Hamid Bolouri, http://labs.fhcrc.org/bolouri
Nature Biotechnology 23, 1249 - 1256 (2005)
Luscombe et al, Genome Biology 2000, 1(1):reviews001.1–001.37
RNA polymerase
initiation complex
RNAregulatory complex
Eve Mardis, Nature Methods, 2007, (4):613
1. Cross-link proteins to DNA
2. Fragment chromatin to 100-150bp
3. Immunoprecipitate antibody-bound DNA fragments
4. Reverse cross-links and sequence fragment ends
5. Map sequence reads to genome
6. Identify genomic regions with enriched number of mapped reads
ChIP-seq Overview
Barbara Wold’s lab, Science, 2007, 16(5830):1497-502
bone marrow
macrophages
microglia
osteoblasts
follicularB cells
Predicted Hes1 binding site
http
://b
iogps.
gnf.o
rg/
Ly9 expression is repressed in the bone marrow
Dat
a Su
zan
ne
Furu
yam
a&
Irw
in B
ern
stei
n, F
HC
RC
Kharchenko, Tolstorukov and Park
Torres, Metta, Ottenwälder & Schlötterer, Genome Research, 2007, 18:000
Ordahl, Johnson & CaplanNAR, 1976, 3(11):2985-2999
0 500 1000 1500 2000
DNA fragment length (base pairs)
expected fragment frequency
sequenced fragment frequency
Norm
aliz
ed
fra
gm
ent
co
un
t
Fragment length (X100bp)
100bp
200bp
300bp400bp
N1 Hes1
DNA fragments (excluding 120bp adapters) are
asymmetrically distributed ~ 50bp–280bp
(data: Suzanne Furuyama, Bernstein lab, FHCRC)
Zhang et al, PLoS Comp. Biol. Aug. 2008
Part (1): Calling ChIP-seq peaks with PICS (Probablistic Inference of ChIP-seq)
1. Segment the genome into Bound Regions
a. Sliding window of length ~ ½ to 1 x average fragment length
b. Move window in steps of ~ expected motif length (say 10bp)
c. Mark as Bound if number of reads > Tmin (~ 1f, 1r)
2. For each putative peak in a Region
3. Overall reads distribution in region
Parameters:
relative proportion of reads / peak
average fragment length
location of peak
f SD of forward reads
r SD of reverse reads
: parameter controlling spread of k
▲ Bayesian conjugate prior for
♦ It can be seen that if the degrees of freedom i is fixed in advance for
each component, then the M-step exists in closed form.
▲ deterministic data fitting with ECM
♦ fast analytic maximization
For each Region;
- Fit multiple models with 1 to 15 peaks
- Use max(BIC) to select ‘best’ model
log likelihood number of peaksnumber of observed
reads
Post-processing step
- Merge peaks that overlap
- Remove peaks that fail:
(assuming ~100bp window size)
unfiltered
filtered
1/n
n = no. k-meroccurrences in genome
after
before
mappability=0
PICS score =
)(N
N
N
control
IP
ersenoReadsRev , wardnoReadsFor where
IP in reads of number total
control in reads of number total
min
1
1.
For FDR, re-run PICS with IP and control swapped, then
FDR =
control vs IP in T score withpeaks of number
IP vs control in T score withpeaks of number
GABP FOXA1
Zhang et al, Biometrics 2010, Jun 1st [Epub ahead of print]
1. Select a set of dyads as starting points
a. Enumerate all (3, 4, 5, 6) k-mers
b. Rank k-mers by over-representation in data
c. Form ~100 dyad seeds as (k-mer1, spacer(length=l ), k-mer2)
(k-mers are selected with prob∝ rank order
2. Compute dyad PWMs from occurrences of matches in data
3. Use EM to maximize (~25% of) matches to the candidate PWM
4. Align each dyad’s predicted binding sites and calculate the llr
positions all bases, all ndb,backgrou
b,pb,p
f
f.fMoodRatioLogLikelih log M=total number of sites
5. Estimate p-value from above llr given dyad’s length and number of sites
(GADEM uses MEME method)
6. Matches with p-value < threshold are declared binding sites
7. Calculate E-value = p-value corrected for the size of the search space
8. Calculate fitness as:
9. Accept top 10% fittest motifs, mutate or cross-over the rest
a. mutate: swap in 1 element of (k-mer1, spacer(length=l ), k-mer2) from a random dyad
b. cross-over: swap elements between selected dyads
10.Extend motifs 10bp on each side, then prune from ends till
Information Content = < threshold (+di-,tri-mers)
11.Repeat from Step 2 (typically 5-10 times)
value)(EFitness
log
1
See also PLoS Comp Biol, March 2007 | Volume 3 | Issue 3 | e61
benoslab.pitt.edu/stamp/
*** ***
***
B/W Venn diagrams: Matches to FOXA1 motifs from different algorithms.