Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp

Review by Hamid Bolouri, http://labs.fhcrc.org/bolouri

Nature Biotechnology 23, 1249 - 1256 (2005)

Luscombe et al, Genome Biology 2000, 1(1):reviews001.1–001.37

RNA polymerase

initiation complex

RNAregulatory complex

Eve Mardis, Nature Methods, 2007, (4):613

1. Cross-link proteins to DNA

2. Fragment chromatin to 100-150bp

3. Immunoprecipitate antibody-bound DNA fragments

4. Reverse cross-links and sequence fragment ends

5. Map sequence reads to genome

6. Identify genomic regions with enriched number of mapped reads

ChIP-seq Overview

Barbara Wold’s lab, Science, 2007, 16(5830):1497-502

bone marrow

macrophages

microglia

osteoblasts

follicularB cells

Predicted Hes1 binding site

http

://b

iogps.

gnf.o

rg/

Ly9 expression is repressed in the bone marrow

Dat

a Su

zan

ne

Furu

yam

a&

Irw

in B

ern

stei

n, F

HC

RC

Kharchenko, Tolstorukov and Park

Torres, Metta, Ottenwälder & Schlötterer, Genome Research, 2007, 18:000

Ordahl, Johnson & CaplanNAR, 1976, 3(11):2985-2999

0 500 1000 1500 2000

DNA fragment length (base pairs)

expected fragment frequency

sequenced fragment frequency

Norm

aliz

ed

fra

gm

ent

co

un

t

Fragment length (X100bp)

100bp

200bp

300bp400bp

N1 Hes1

DNA fragments (excluding 120bp adapters) are

asymmetrically distributed ~ 50bp–280bp

(data: Suzanne Furuyama, Bernstein lab, FHCRC)

Zhang et al, PLoS Comp. Biol. Aug. 2008

Part (1): Calling ChIP-seq peaks with PICS (Probablistic Inference of ChIP-seq)

1. Segment the genome into Bound Regions

a. Sliding window of length ~ ½ to 1 x average fragment length

b. Move window in steps of ~ expected motif length (say 10bp)

c. Mark as Bound if number of reads > Tmin (~ 1f, 1r)

2. For each putative peak in a Region

3. Overall reads distribution in region

Parameters:

relative proportion of reads / peak

average fragment length

location of peak

f SD of forward reads

r SD of reverse reads

: parameter controlling spread of k

▲ Bayesian conjugate prior for

♦ It can be seen that if the degrees of freedom i is fixed in advance for

each component, then the M-step exists in closed form.

▲ deterministic data fitting with ECM

♦ fast analytic maximization

For each Region;

- Fit multiple models with 1 to 15 peaks

- Use max(BIC) to select ‘best’ model

log likelihood number of peaksnumber of observed

reads

Post-processing step

- Merge peaks that overlap

- Remove peaks that fail:

(assuming ~100bp window size)

unfiltered

filtered

1/n

n = no. k-meroccurrences in genome

after

before

mappability=0

PICS score =

)(N

N

N

control

IP

ersenoReadsRev , wardnoReadsFor where

IP in reads of number total

control in reads of number total

min

1

1.

For FDR, re-run PICS with IP and control swapped, then

FDR =

control vs IP in T score withpeaks of number

IP vs control in T score withpeaks of number

GABP FOXA1

Zhang et al, Biometrics 2010, Jun 1st [Epub ahead of print]

1. Select a set of dyads as starting points

a. Enumerate all (3, 4, 5, 6) k-mers

b. Rank k-mers by over-representation in data

c. Form ~100 dyad seeds as (k-mer1, spacer(length=l ), k-mer2)

(k-mers are selected with prob∝ rank order

2. Compute dyad PWMs from occurrences of matches in data

3. Use EM to maximize (~25% of) matches to the candidate PWM

4. Align each dyad’s predicted binding sites and calculate the llr

positions all bases, all ndb,backgrou

b,pb,p

f

f.fMoodRatioLogLikelih log M=total number of sites

5. Estimate p-value from above llr given dyad’s length and number of sites

(GADEM uses MEME method)

6. Matches with p-value < threshold are declared binding sites

7. Calculate E-value = p-value corrected for the size of the search space

8. Calculate fitness as:

9. Accept top 10% fittest motifs, mutate or cross-over the rest

a. mutate: swap in 1 element of (k-mer1, spacer(length=l ), k-mer2) from a random dyad

b. cross-over: swap elements between selected dyads

10.Extend motifs 10bp on each side, then prune from ends till

Information Content = < threshold (+di-,tri-mers)

11.Repeat from Step 2 (typically 5-10 times)

value)(EFitness

log

1

See also PLoS Comp Biol, March 2007 | Volume 3 | Issue 3 | e61

benoslab.pitt.edu/stamp/

*** ***

***

B/W Venn diagrams: Matches to FOXA1 motifs from different algorithms.

Documents

Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp