ChIP seq - Departments

ChIP‐seq

ChIP SeqChIP‐Seq

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

ChIP Seq AnalysisChIP‐Seq Analysis

Alignment

Peak Detection

Annotation Visualization

Sequence Analysis

Motif Analysis

AlignmentAlignment

• ELAND

• BowtieBowtie

• SOAP

• SeqMap

• …

Peak detectionPeak detection

i d k• FindPeaks• CHiPSeqq• BS‐Seq• SISSRs• SISSRs• QuEST• MACS• CisGenomeCisGenome• …

Two common designsTwo common designs

• One sample experiment

contains only a ChIP’d samplecontains only a ChIP d sample

• Two sample experiment

contains a ChIP’d sample and a negativecontains a ChIP d sample and a negative control sample

One sample analysisOne sample analysisA simple way is the sliding window method

Poisson background model is commonly used to estimate error rateki ~ Poisson(λ0)

Or people use Monte Carlo simulations

Both are based on the assumption that read sampling rate is a constant p p gacross the genome.

Ji et al. Nat Biotechnol, 26: 1293-1300. 2008

The constant rate assumption does not hold!The constant rate assumption does not hold!

Negative binomial model fits the data better!ki | λi ~ Poisson(λi)ki | λi Poisson(λi)λi ~ Gamma(α, β)

Marginally,ki ~ NegBinom(α, β)

FDR estimation based on Poisson and negative binomial model

Read direction provides extra informationRead direction provides extra information

CisGenome procedureCisGenome procedure

Alignment

Exploration

FDR computation

Negative binomial model

Peak DetectionPeak Detection

Post Use read direction to refine Post Processing peak boundary and filter

low quality peaks

Two sample analysisTwo sample analysisReason: read sample rates at the same genomic locus are correlated across different

lsamples.

CisGenome two sample analysisCisGenome two sample analysis

Ali tAlignment

Exploration

ni =k1i + k2ik1i | ni ~ Binom(ni , p0)

FDR computation

Peak Detection

Post Processingg

A comparative study of ChIP chip and ChIP seqA comparative study of ChIP‐chip and ChIP‐seq

• NRSF ChIP‐chip

2 ChIP + 2 Mock IP in Jurkat cells, profiled using Affymetrix Human Tiling 2.0R arrays.

• NRSF ChIP‐seq

ChIP + Negative Control in Jurkat cells sequenced with theChIP + Negative Control in Jurkat cells, sequenced with the next generation sequencer made by Illumina/Solexa.

IntersectionIntersection

Before post‐processing After post‐processing

Signal correlationSignal correlation

Visual comparisonVisual comparison

Comparison of peak detection resultsComparison of peak detection results

Are array specific peaks noise or signal?Are array specific peaks noise or signal?

Effects of read number in ChIP seqEffects of read number in ChIP‐seq

Motif Analysis

Sequence motif – a pattern of nucleotide or amino dacid sequences

DNA motif:

GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA

TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA

CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA

123456789

TGGGTGGTC

TGGGTGGTA

TTAGAGGCACAATTGCTTGGGTGGTGCACAAAAAAACAAG

AACAGCCTTGGATTAGCTGCTGGGGGGGTGAGTGGTCCAC

TGGGAGGTC

TGGGTGGTG

TGAGTGGTC

TGGGTGGTCATCAGAATGGGTGGTCCATATATCCCAAAGAAGAGGGTAG

TF TGGGTGGTC

Transcription Factor Binding Sites (TFBS)

Protein motif:

Motif representationMotif representation

Consensus sequenceConsensus sequence

Example: CACSTGExample: CACSTG

Sequence Logoq gSchneider & Stephens, Nucleic Acids Res. 18:6097‐6100 (1990)

Entropy (Shannon) – a measurement of uncertainty

The amount of uncertainty reduced by observing sequences is the amount of information (or information content) we obtained:

This is the height of each position in the logo plot.

Height of each nucleotide is proportional to its frequency

Two questions in motif analysisTwo questions in motif analysis

• Known motif mapping

Finding occurrences of a motif in nucleotide or amino acid sequences

• De novomotif discovery

Finding motifs that are previously unknown

Known motif mappingKnown motif mapping

• Consensus mapping

STEP 1: provide a motif (e.g. CACSTG = CAC[C,G]TG)

STEP 2: specify number of mismatches allowed (e.g. <=1)

STEP 3 thSTEP 3: scan the sequence

CGCCGGGACCAGATCAACGCCGAGATCCGGCACATGAAGGAGCTCGCCGGG CC G C CGCCG G CCGGC C G GG GC

m=3, no m=1, yes

A useful tool: CisGenome (http://www.biostat.jhsph.edu/~hji/cisgenome)

Known motif mappingKnown motif mapping

• Motif matrix mapping (CisGenome)Motif matrix mapping (CisGenome)STEP 1: provide a motif and background model

STEP 2: specify a likelihood ratio cutoff (e.g. LR>=500)p y ( g )

STEP 3: scan the sequence

θ0ΘMotif:Background: 0

A C G TA .3 .2 .2 .3C .2 .3 .3 .2G .2 .3 .3 .2T .3 .2 .2 .3

1 2 3 4 5 6 7 8 9A 0.00 0.00 0.17 0.00 0.17 0.00 0.00 0.00 0.17C 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66G 0.00 1.00 0.83 1.00 0.00 1.00 1.00 0.00 0.17T 1 00 0 00 0 00 0 00 0 83 0 00 0 00 1 00 0 00

GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA

LR>500 yes LR<500 no

T 1.00 0.00 0.00 0.00 0.83 0.00 0.00 1.00 0.00

LR>500, yes LR<500, no

• Another tool for matrix mappingMAST (http://meme.sdsc.edu/meme/mast‐intro.html)

De novomotif discoveryDe novomotif discovery

• Two major class of methods:

1. Word enumeration

2. Matrix updating

Word enumerationWord enumeration

STEP 1: enumerate possible words;

STEP 2: count word occurrences;

STEP 3: compare observed word count with random expectation.

Example: Sinha & Tompa, Nucleic Acids Res. 30: 5549‐5560 (2002)

Matrix updatingMatrix updating

• CONSENSUS (Stormo & Hartzell, PNAS, 86: 1183‐1187, 1990)

STEP 1: use all k‐mers in the first sequence as seeds;

STEP 2: find matches (often use best matches) of each seed in the second sequence;

STEP 3: update seed matrices, exclude matrices with low informationSTEP 3: update seed matrices, exclude matrices with low information content;

STEP 4: repeat step 2 and 3 for all sequences.

Motif discovery – a mixture model methodMotif discovery – a mixture model method

A C G TA C G T

A .3 .2 .2 .3C .2 .3 .3 .2G .2 .3 .3 .2T .3 .2 .2 .3

1 2 3 4 5 6 7 8 9A 0.00 0.00 0.17 0.00 0.17 0.00 0.00 0.00 0.17C 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66G 0.00 1.00 0.83 1.00 0.00 1.00 1.00 0.00 0.17T 1.00 0.00 0.00 0.00 0.83 0.00 0.00 1.00 0.00

θ0 Θ, W

S: GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA

Motif:Background:

q = [q0,q1]q0 q1

S: GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA

A: 000000000000001000000000000000000000000001000000000000000000000000000000

)()|()|( qWΘθqWΘASθSqWΘA 00 πff ∝

),,(),,,|,(),|,,,( qWΘθqWΘASθSqWΘA 00 πff ∝

Inference by iterative estimation/sampling

Lawrence and Reilly (1990)

Bailey and Elkan (1994), etc.

Gibbs Sampler:Θ,W,q A

Lawrence et al. (1993)

Liu (1994), Liu et al. (1995), etc.

Ci l t d l diCis-regulatory module discovery(Zhou and Wong, PNAS 2004)

• Module structure: consider co-localization of motif sites.

0θ 1Θ KΘL

⎥⎥⎤

⎢⎢⎡

25.025.0

L⎥⎥⎥

⎦⎢⎢⎢

⎣ 25.025.0

Motif 1 Motif 2 Motif 3

Hi hi l Mi d li B M

Hierarchical Mixture modeling

K: # of motifs

r−1 r

Phylogenetic FootprintingPhylogenetic Footprinting

For example, exons are conserved due to the selection pressure. Introns and intergenic regions are less likely to be conservedintergenic regions are less likely to be conserved.

Phylogenetic footprinting & motif discoveryPhylogenetic footprinting & motif discovery

• Evolutionary model based approach

EMnEM (Moses et al. 2004)EMnEM (Moses et al. 2004)

PhyME (Sinha et al. 2004)

PhyloGibbs (Siddharthan et al. 2005)

Tree Sampler (Li and Wong 2005)Tree Sampler (Li and Wong, 2005)

ChIP seq - Departments

Documents

Rna seq and chip seq

ChIP-seq MBD-seq (MIRA-seq) BS-seq RNA-seq miRNA-seq

ChIP-Seq Data Processing - Genes & Developmentgenesdev.cshlp.org/.../Supplemental_Materials.docx · Web viewshows ChIP-seq control experiments including inhibitor studies, validation

ChIP-seq Analysis - Massachusetts Institute of Technologybarc.wi.mit.edu/education/hot_topics/ChIPseq_2017/AnalysisofChIP... · Outline • ChIP-seq overview • Experimental design

ChIP-seq analysis – D. Puthier

Introduction to ChIP-seq using High-Performance Computing ... · Describe best practices for designing an ChIP-seq experiment Describe steps in a typical ChIP-seq analysis workflow

ChIP-Seq Concepts and Applications - Bioconductor · Figure 1 Protein-binding detection from ChIP-seq data. ( a) Main steps of the proposed ChIP-seq processing pipeline. ( b) Schematic

ChIP-seq analysis · ChIP-seq analysis J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand Tuesday : quick introduction to ChIP-seq and peak-calling

More on TF Motif Finding ChIP-chip / seq

PPARgamma in adipocyte differentiation - a ChIP-Seq case study · ChIP-Seq workflow To obtain a first overview of the data we recommend the use of the ChIP-Seq workflow which can

ChIP-seq Theory

EMBL Advanced Course RNA-Seq and ChiP-Seq Data Advanced Course RNA-Seq and ChiP-Seq Data Outline • Sequence alignment • Aligners • Recent development • Aligners’ usage •

ChIP-seq data: quality control, read mapping and peak calling · 2015. 8. 6. · ChIP-seq data: quality control, read mapping and peak calling Data included raw ChIP-seq reads of

Integrating ChIP-seq and RNA-seq datakendzior/STAT877/SLIDES/bacher1.pdf · Integrating ChIP-seq and RNA-seq data •Why study regulatory mechanisms of transcription factors on gene

April 2009 by SoftGenetics ChIP-Seq Analysis with NextGENe ... · NextGENe™ by SoftGenetics April 2009 ChIP-Seq Analysis with NextGENe Software Introduction ChIP-Seq studies are

ChIP-seq analysis – D. Puthierjvanheld.github.io/cisreg_course/chip-seq/slides/chipseq__roscoff20… · Denis Puthier -- BBSG2 2015-2016 --ChIP-Seq: technical considerations Quality

RNA-Seq / ChIP-Seq Analysis Workflow

ChIP-seq - Data processing

computation for chIP-seq and rNA-seq studiessandberg.cmb.ki.se/media/data/courses/bioinfocell/Nat Methods 2009 Pepke.pdfof this review. We view the data analysis for ChIP-seq and RNA-seq

Chip – Seq Peak Calling in Galaxy