Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)

  • Published on

  • View

  • Download

Embed Size (px)


  • Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)

  • 2nd and 3rd Generation DNA Sequencers and Applications

    Roche 454 (2nd)Illumina Solexa(2nd)ABI SoLid (2nd)Helicos (3rd)

    ApplicationsDe novo sequencingTargeted resequencingDigital Gene Expression (DGE)RNA-seqChIP-seqSequencing Platforms

  • Why ChIP-seq?Protein-DNA interactionsChromatin StatesTransciptional regulation

  • ChIP experiment

    In Nutshell

    Protein cross-linked to DNA in vivo by treating cells with formaldehyde

    Shear chromatin (sonication)

    IP with specific antibody

    Reverse cross-links, purify DNA

    PCR amplification*

    Identify sequences

    Genome-wide association map*-unless using a single molecule sequencer

  • History: From ChIP-chip to ChIP-seqChIP-chip (c.2000) Resolution (30-100bp) Coverage limited by sequences on the array Cross-hybridization between probes and non-specific targets creates background noise

  • ChIP-seq experiment (2007-present)

  • Sample Prep: Solexa vs. Helicos

  • ChIP-seq Materialsample preps with in-house protocolsHelicos sample prep

    Normal QC and ChIP stepsInput material 3ng-9ngRNAseA/ProteinaseK treatment (2-3h)Purification (phenol/precipitation) (1.5h) Tailing (1.5h)Termination (1.5h)

    Amount of library sequenced approx. 1/3

    Unique Tags after analysis approx >12M(based on our limited ERaChIP-seq libraries)

    **Slide borrowed from Thomas Westerling

    Solexa sample prep

    Normal QC and ChIP stepsInput material typically >30ngEnd-Repair (1h)Purification (phenol/precipitation) (1.5h) A-overhang (1h)Purification (phenol/precipitation) (1.5h)Adapter oligo ligation (30min)Purification (phenol/precipitation) (1.5h)Size-selection (30min by E-gel)Precipitation (1h)Amplification PCR (2h) (12-18 cycles)Size-selection (30min by E-gel)Precipitation (1h)Diagnostic gel (30min) QC by direct qPCR (4hours)

    Amount of library sequenced approx. 1/10

    Unique Tags after analysis > 3M (based on our limited ERaChIP-seq libraries)

  • Sheet1

    ChIP-chipChIP-seq (Solexa)ChIP-seq (Helicos)

    Max resolutionArray-specific, 30-100bp1 nt1 nt

    CoverageLimited by sequences on array and non-repetitiveLimited by alignability of reads to genome; increases with read length; many repetitive regions can be coveredLimited by alignability of reads to genome; increases with read length; many repetitive regions can be covered

    Cost$400-800 per array (multiple arrays needed for large genomes$1,000-2,000 per lane$500 at MBCF

    Source of platform noiseCross-hybridization between probes and non-specific targetsSome GC bias due to PCRsingle molecule (no PCR)

    Required amount of ChIP DNAMicrograms10-50ng9-12ng standard, have done 1.5ng at MBCF

    Dyanmic Rangelower detection limit;saturation at high signalNo limitNo limit

    Amplification (PCR)more requiredless (12-17cyclesNone

    Multi-plexingnot an optionYesSure



  • Helicos vs Solexa vs ChIP22. Helicos1. Solexa3. ChIP2470037445293433254129001661Solexa data (red):Unique tags 4MPeaks called 10 500ANegative peaks 20 000B

    ChIP2C data (green):Array technology, no tagsPeaks called 12 500FDR 20D

    Helicos data (blue):Unique tags 13MPeaks called 12 500Negative peaks 1000E

    A) More inclusive (10%) ELAND mapping used (compare to Bowtie in library table)B) MACS performs a sample swap between ChIP and Input (chromatin) samples and calculates a local -value to determine level of background peaks called in control data. This gives a FDR for each positive peak. Due to the nature of deep sequencing combined with PCR this parameter is in some sample extremely high and not entirely trustworthy. C) ChIP2 data published in Carroll et al. Nat Genet. 2006 Nov;38(11):1289-97. D) FDR values of ChIP2 are calculated differently from FDRs by MACS and are not directly comparable. E) Negative peaks and thus local FDR values are at first glance more reliable in Helicos sequencing, in part at least due to the lack of amplification the removes scientist introduced artifacts and reduced complexity of sequenced library.

  • ChIP-seq Analysis

  • ChIP-seq peaks

    Only 5 end of fragments are sequenced Tags from both + and - strand aligned to reference genome

  • +/- tag mapping

  • Types of AnalysisBinding site identification and discovery of binding sequence motifs (Non-histone ChIP)

    Epigenomic gene regulation and chromatin structure (Histone ChIP)

  • Binding Site DetectionBut where does the meat go?

  • Control: Input DNAMeasuring enrichmentRozowsky, J. et al. PeakSeq enables systematic scoring of ChIPSeq experiments relative to controls. Nature Biotech. 27, 66-75 (2009)Input DNA: portion of DNA sample removed before IP

  • Why we need to sequence Input DNA Input DNA does not demostrate flat or random (Poisson) distribution

    Open chromatin regions tend to be fragmented more easily during shearing

    Amplification bias

    Mapping artifacts-increased coverage of more mappable regions (which also tend to be promotor regions) and repetitive regions due inaccuracies in number of copies in assembled genome

  • Depth of SequencingAre we there yet?

  • ERa E2 Helicos MACS peaks 12500(tag30 mfold30) sequence depth determination by subsamplingFoldChange Bins 0-2020-40 40-6060-8080-100100-120120-140140-160160-180180-200200-220Number of total 7687 2841 935 429 217 140 85 49 23 7 4Peaks in each bin % peaks detectedof total peaks/bin% of tags sampled

  • Statistical Significance

  • Helicos InputHelicosChIPSolexaChIPSolexa InputMACS shifted tag-count graph i.e. Peak shapes

  • Helicos InputHelicosChIPSolexaChIPSolexa InputMACS shifted tag-count graph i.e. Peak shapes

  • Helicos InputHelicosChIPSolexaChIPSolexa InputMACS shifted tag-count graph i.e. Peak shapes

    Histone modifications (methylation at K and R) play role in gene regulation; both expression and repression. Enzymes that catalyze methylation reaction have been implicated in playing a critical roles in development and pathological processes. Promotor regions of active genes have reduced nucleosome occupancy and elevated histone acetylation.Tiled arrays cover most of the non-repetitive genome. Cost increases with size of genome. Ex. Yeast has been very well characterized, but human not so much due to genome size