16
NATURE PROTOCOLS | VOL.5 NO.3 | 2010 | 457 p u o r G g n i h s i l b u P e r u t a N 0 1 0 2 © natureprotocols / m o c . e r u t a n . w w w / / : p t t h PROTOCOL INTRODUCTION Chromatin immunoprecipitation (ChIP) is a method used to locate DNA-binding sites of a particular DNA-binding protein of interest in the genome 1 . DNA-binding proteins can be transcription factors (TFs) or other chromatin-associated proteins, like histones. The method makes use of the ability to crosslink proteins to the DNA, to which they bind in vivo. Specific protein–DNA complexes can then be purified by immunoprecipitation using a specific anti- body against the DNA-binding protein of interest or against a tag that is fused to the protein 2 . Antibodies that are specific for certain protein modifications, e.g., methylation of histones, can be used to explore the epigenetic landscape of the genome 3–5 . In order to generate genome-wide binding profiles of specific proteins, whole genome-tiling arrays (ChIP-CHIP) or next-generation sequencing technologies (ChIP-SEQ) can be employed 6 . The ChIP methodology has produced numerous applications in studying the composition and dynamics of chromatin landscapes and TF binding, as well as assessing the interplay between different factors in gene regulation. ChIP approaches can be used to study DNA methylation, chromatin structure/histone modifications, and the cooperative binding of TFs 7 . In contrast to other eukaryo- tic systems, genome-wide applications of the ChIP methodology in the plant field are still relatively scarce. This delay is partly due to technical difficulties in ChIP sample preparation related to the complexity of plant tissues (‘diluting’ the amount of cells in which the TF of interest is active) and other properties of plant tissues (e.g., presence of cell walls and secondary compounds). In addi- tion, tiling arrays only recently became available for Arabidopsis. ChIP-CHIP has been used to characterize histone modifications in Arabidopsis seedlings and tissue culture 8–10 . There are only few reports of the successful application of ChIP-CHIP to generate binding profiles of TFs in Arabidopsis 11–13 . In the earliest attempt by Lee et al. 11 , a custom-made high-density 60-nucleotide oligomer microarray with one probe for every 500 nucleotides was used. With this array, they identified several thousands potential target genes of the light-regulated TF HY5, which was expressed as an HA-tagged fusion protein from the constitutive Cauliflower Mosaic Virus (CaMV) 35S promoter. In a similar overexpression approach, however, using whole-genome tiling arrays consisting of 50mers located every 60 bp of the genome, Oh et al. 12 identified direct targets of the bHLH TF PIL5. Morohashi and Grotewold 13 characterized the regulatory circuitry of trichome initiation by determining the downstream targets of the developmental key regulators GL1 and GL3. They determined DNA-binding sites in a native context making use of a promoter–gene construct containing a translational fusion to fluorescent protein tags. Recently, we have used ChIP-SEQ and ChIP-CHIP to identify binding sites of SEPALLATA3 (SEP3), a floral key regulator in Arabidopsis 14 . We determined the binding sites in the native context of flower development using a peptide antibody specific to the endogenous SEP3 protein. For the first time in plants, we used ChIP-SEQ in addition to ChIP-CHIP to characterize binding sites of a TF at a genome-wide scale. The ChIP-SEQ experiment was carried out using the SOLEXA (Illumina, San Diego, CA, USA)-sequencing technology. For genome-wide applications, the signal-to-noise ratio of the ChIP experiment, i.e., the level of specific enrichment of TF-bound genomic regions over non-specifically precipitated DNA, is a major issue. The ChIP protocol presented here gives high levels of enrichment in our hands. In general, the enrichment depends on the expression level of the protein of interest in the analyzed tissue, as well as the efficiency of the antibody. In our experi- ments, we generally achieved at least 8–16-fold enrichment for the positive control genes as estimated by real-time qPCR. The pro- tocol is based on existing procedures with some modifications to Chromatin immunoprecipitation (ChIP) of plant transcription factors followed by sequencing (ChIP-SEQ) or hybridization to whole genome arrays (ChIP-CHIP) Kerstin Kaufmann 1 , Jose M Muiño 2 , Magne Østerås 3 , Laurent Farinelli 3 , Pawel Krajewski 4 & Gerco C Angenent 5, 6 1 Laboratory of Molecular Biology, Wageningen University, Wageningen, The Netherlands. 2 Applied Bioinformatics, Plant Research International, Wageningen University & Research Centre, Wageningen, The Netherlands. 3 Fasteris SA, Plan-les-Ouates, Geneva, Switzerland. 4 Institute of Plant Genetics, Polish Academy of Sciences, Poznań, Poland. 5 Bioscience, Plant Research International, Wageningen, The Netherlands. 6 Centre for BioSystems Genomics (CBSG), Wageningen, The Netherlands. Correspondence should be addressed to K.K. ([email protected]). Published online 18 February 2010; doi:10.1038/nprot.2009.244 Chromatin immunoprecipitation (ChIP) is a powerful technique to study interactions between transcription factors (TFs) and DNA in vivo. For genome-wide de novo discovery of TF-binding sites, the DNA that is obtained in ChIP experiments needs to be processed for sequence identification. The sequences can be identified by direct sequencing (ChIP-SEQ) or hybridization to microarrays (ChIP-CHIP). Given the small amounts of DNA that are usually obtained in ChIP experiments, successful and reproducible sample processing is challenging. Here we provide a detailed procedure for ChIP of plant TFs, as well as protocols for sample preparation for ChIP-SEQ and for ChIP-CHIP. Our ChIP procedure is optimized for high signal-to-noise ratio starting with tissue fixation, followed by nuclei isolation, immunoprecipitation, DNA amplification and purification. We also provide a guide for primary data analysis of ChIP-SEQ data. The complete protocol for ChIP-SEQ/ChIP-CHIP sample preparation starting from plant harvest takes ~7 d.

Chromatin immunoprecipitation (ChIP) of plant transcription factors followed by sequencing (ChIP-SEQ) or hybridization to whole genome arrays (ChIP-CHIP)

  • Upload
    gerco-c

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

nature protocols | VOL.5 NO.3 | 2010 | 457

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

IntroDuctIonChromatin immunoprecipitation (ChIP) is a method used to locate DNA-binding sites of a particular DNA-binding protein of interest in the genome1. DNA-binding proteins can be transcription factors (TFs) or other chromatin-associated proteins, like histones. The method makes use of the ability to crosslink proteins to the DNA, to which they bind in vivo. Specific protein–DNA complexes can then be purified by immunoprecipitation using a specific anti-body against the DNA-binding protein of interest or against a tag that is fused to the protein2. Antibodies that are specific for certain protein modifications, e.g., methylation of histones, can be used to explore the epigenetic landscape of the genome3–5. In order to generate genome-wide binding profiles of specific proteins, whole genome-tiling arrays (ChIP-CHIP) or next-generation sequencing technologies (ChIP-SEQ) can be employed6.

The ChIP methodology has produced numerous applications in studying the composition and dynamics of chromatin landscapes and TF binding, as well as assessing the interplay between different factors in gene regulation. ChIP approaches can be used to study DNA methylation, chromatin structure/histone modifications, and the cooperative binding of TFs7. In contrast to other eukaryo-tic systems, genome-wide applications of the ChIP methodology in the plant field are still relatively scarce. This delay is partly due to technical difficulties in ChIP sample preparation related to the complexity of plant tissues (‘diluting’ the amount of cells in which the TF of interest is active) and other properties of plant tissues (e.g., presence of cell walls and secondary compounds). In addi-tion, tiling arrays only recently became available for Arabidopsis. ChIP-CHIP has been used to characterize histone modifications in Arabidopsis seedlings and tissue culture8–10. There are only few reports of the successful application of ChIP-CHIP to generate binding profiles of TFs in Arabidopsis11–13. In the earliest attempt by Lee et al.11, a custom-made high-density 60-nucleotide oligomer

microarray with one probe for every 500 nucleotides was used. With this array, they identified several thousands potential target genes of the light-regulated TF HY5, which was expressed as an HA-tagged fusion protein from the constitutive Cauliflower Mosaic Virus (CaMV) 35S promoter. In a similar overexpression approach, however, using whole-genome tiling arrays consisting of 50mers located every 60 bp of the genome, Oh et al.12 identified direct targets of the bHLH TF PIL5. Morohashi and Grotewold13 characterized the regulatory circuitry of trichome initiation by determining the downstream targets of the developmental key regulators GL1 and GL3. They determined DNA-binding sites in a native context making use of a promoter–gene construct containing a translational fusion to fluorescent protein tags. Recently, we have used ChIP-SEQ and ChIP-CHIP to identify binding sites of SEPALLATA3 (SEP3), a floral key regulator in Arabidopsis14. We determined the binding sites in the native context of flower development using a peptide antibody specific to the endogenous SEP3 protein. For the first time in plants, we used ChIP-SEQ in addition to ChIP-CHIP to characterize binding sites of a TF at a genome-wide scale. The ChIP-SEQ experiment was carried out using the SOLEXA (Illumina, San Diego, CA, USA)-sequencing technology.

For genome-wide applications, the signal-to-noise ratio of the ChIP experiment, i.e., the level of specific enrichment of TF-bound genomic regions over non-specifically precipitated DNA, is a major issue. The ChIP protocol presented here gives high levels of enrichment in our hands. In general, the enrichment depends on the expression level of the protein of interest in the analyzed tissue, as well as the efficiency of the antibody. In our experi-ments, we generally achieved at least 8–16-fold enrichment for the positive control genes as estimated by real-time qPCR. The pro-tocol is based on existing procedures with some modifications to

Chromatin immunoprecipitation (ChIP) of plant transcription factors followed by sequencing (ChIP-SEQ) or hybridization to whole genome arrays (ChIP-CHIP)Kerstin Kaufmann1, Jose M Muiño2, Magne Østerås3, Laurent Farinelli3, Pawel Krajewski4 & Gerco C Angenent5, 6

1Laboratory of Molecular Biology, Wageningen University, Wageningen, The Netherlands. 2Applied Bioinformatics, Plant Research International, Wageningen University & Research Centre, Wageningen, The Netherlands. 3Fasteris SA, Plan-les-Ouates, Geneva, Switzerland. 4Institute of Plant Genetics, Polish Academy of Sciences, Poznań, Poland. 5Bioscience, Plant Research International, Wageningen, The Netherlands. 6Centre for BioSystems Genomics (CBSG), Wageningen, The Netherlands. Correspondence should be addressed to K.K. ([email protected]).

Published online 18 February 2010; doi:10.1038/nprot.2009.244

chromatin immunoprecipitation (chIp) is a powerful technique to study interactions between transcription factors (tFs) and Dna in vivo. For genome-wide de novo discovery of tF-binding sites, the Dna that is obtained in chIp experiments needs to be processed for sequence identification. the sequences can be identified by direct sequencing (chIp-seQ) or hybridization to microarrays (chIp-cHIp). Given the small amounts of Dna that are usually obtained in chIp experiments, successful and reproducible sample processing is challenging. Here we provide a detailed procedure for chIp of plant tFs, as well as protocols for sample preparation for chIp-seQ and for chIp-cHIp. our chIp procedure is optimized for high signal-to-noise ratio starting with tissue fixation, followed by nuclei isolation, immunoprecipitation, Dna amplification and purification. We also provide a guide for primary data analysis of chIp-seQ data. the complete protocol for chIp-seQ/chIp-cHIp sample preparation starting from plant harvest takes ~7 d.

458 | VOL.5 NO.3 | 2010 | nature protocols

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

minimize the background in the final ChIP DNA sample2,15–18. We also present a modified protocol for adaptor ligation and ampli-fication of DNA to generate libraries for ChIP-SEQ and ChIP-CHIP. Our sample preparation procedure can be easily adapted to meet the requirements for multiplex or paired-end SOLEXA sequencing.

Our amplification protocol, which is based on and modified from previous protocols19 works efficiently for the very small amounts of randomly sheared DNA (can be < 5–15 ng) in a typical ChIP sample of a specifically DNA-binding TF obtained with a highly specific antibody. DNA-binding sites can be determined at high precision, as we do not include a shearing step of the DNA before adaptor ligation, different from what is suggested by other pro-tocols20. By describing details of our protocols for the complete sample preparation procedures, we provide a general set up for the generation of binding site maps of TFs in Arabidopsis and other plant species (Fig. 1).

Basic overview over the ChIP protocolThe procedure for chromatin immunoprecipitation consists of several basic steps, each of which should be optimized to achieve optimal enrichment and yield of immunoprecipitated DNA. After harvesting, the plant tissue is immediately fixed to crosslink protein–DNA interactions. Formaldehyde is by far the most frequently used fixative, although other crosslinking agents can also be used, especially to crosslink protein–protein interactions. Application of additional crosslinking agents, like Di(N-succinimidyl) glutarate (DSG), is advantageous if indirect, cofactor-mediated interactions of proteins with DNA are being investigated21,22.

The second basic step in the protocol is the isolation of nuclei. Different methods for nuclei extraction are available. Before start-ing a ChIP experiment using a new antibody, we recommend to extract chromatin by several methods12,18,23,24 and to analyze it for the presence and integrity of the TF of interest using western blot analysis with the antibody to be used in the ChIP experiment.

Another step that should be optimized is the shearing of the chromatin by sonication. We sonicate until the majority of the DNA fragments have a size between 200–1,200 bp. This step strongly depends on the instrument that is used for sonication. Although a conventional sonicator (e.g., MSE Soniprep, see Equipment) works well in our hands, new types of sonicators (e.g., BIORUPTOR, Diagenode, Liege, Belgium) have also been successfully used in plant ChIP experiments25, which may reduce the average fragment size and thus enhance positional resolution.

After the chromatin is sheared, it is important to precipitate any insoluble material. This is usually done in a preclearing step, which involves centrifugation and a ‘blind’ IP not adding any specific antibody but pre-immuneserum. If pre-immuneserum is not available, we recommend two-rounds of centrifuga-tion directly after Step15 (move to Step 17 in the Procedure

section), followed by a prolonged incubation with protein-A beads (Step 19: 1 h 30 min instead of 1 h). In our protocol, we included several centrifugation steps at different points in the procedure to quantitatively remove any insoluble material, as it is a major source for unwanted ‘background’ in the ChIP experiment. After incubation with the specific antibody, antibody–protein/DNA complexes are isolated using protein-A agarose beads. Alternatively to the conventional agarose beads used in our protocol, magnetic beads (e.g., DYNAL beads, Invitrogen, Carlsbad, CA, USA) can be used which are easier to handle. After the immunoprecipitation step, the DNA can be isolated by reverse crosslinking (proteinase K digest of proteins) and purified. After purification, the quality of the ChIP sample should be assessed by testing for the enrichment of positive control sequences, i.e., genomic regions that are known to be bound by the TF of interest, using real-time qPCR. If the enrichment is satisfactory, the ChIP samples can then be processed to generate libraries for ChIP-SEQ or ChIP-CHIP. To test for enrichment, the DNA obtained by ChIP using the specific antibody is compared with ChIP DNA obtained using the same antibody from similar tissue in which the TF of interest is not present, e.g., from knockout plant lines of that specific gene or a sample treated with a non-specific serum. Alternatively, ‘input DNA’ (sheared chromatin that was not subjected to any immunoprecipitation) can be used as reference.

Figure 1 | Chromatin immunoprecipitation (ChIP) and ChIP-SEQ procedures and results. (a) Different steps in a typical ChIP experiment and sample processing. (b) Examples of SEP3 ChIP-CHIP and ChIP-SEQ profiles in a genomic region on chromosome 2 (genomic DNA coordinates 18,816,000–18,832,000). The two upper panels indicate ChIP-CHIP and ChIP-SEQ enrichment patterns, respectively. The lower panel indicates the positions of genomic loci in this region and their exon/intron structure.

Illumina/Solexa sequencing(ChIP-SEQ)

Chr

omat

in im

mun

opre

cipi

tatio

nS

ampl

e pr

oces

sing

b

a

Tissue collection and fixation

End repair/A-tailing/Adaptor ligation

Amplification

Gel purification Partial digest and labelling for ChIP-CHIP

Nuclei isolation and shearing of chromatin

Preclearing

Immunoprecipitation

Reverse crosslinking/Purification of DNA

Quality control

Day 1

Day 2

Day 4

Day 5

Day 3–4

Day 6–7

ChIP-CHIP

ChIP-SEQ

2,000 bp

nature protocols | VOL.5 NO.3 | 2010 | 459

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

ChIP-SEQ versus ChIP-CHIPWe provide a protocol for adaptor-annealing and amplification of ChIP-DNA for ChIP-SEQ. However, the amplified DNA can also be used for ChIP-CHIP experiments26. Hence, we also provide a protocol to label and shear amplified DNA for hybridization to Affymetrix tiling arrays (see Procedure section), which uses steps that were adapted from a previous publication27.

Chromatin immunoprecipitation sequencing and ChIP-CHIP are two fundamentally different technologies to generate genome-wide DNA-binding profiles. ChIP-SEQ makes use of the rapidly evolving platforms for massively parallel DNA sequencing. Sequencing can be used as a tool to quantify the occurrence of specific DNA sequences in complex samples, and by this, iden-tify genomic regions that are enriched in those samples at high resolution. The Illumina Genome Analyzer (GA) is at present the platform that is most frequently used for ChIP-SEQ applications. Other platforms for high-throughput sequencing are Helicos HeliScope and ABI SOLiD. For our previous studies, we used the first generation of the Illumina Genome Analyzer (GA classic), which yielded 1–2 Gbp per 36 bp run. One GA run produces tens of millions of 36 bp short sequences (reads) in an eight-lane flow-cell. With the new sequencing kit (v4.0) and pipeline 1.5, a reduced error rate and an increase in read length and number of reads can be obtained: the Illumina Genome Analyzer IIx, at present, produces ~10 Gbp of high-quality sequence reads of 76 bp length per single-read flow-cell and 20 Gbp for paired-ends runs. An increase in read length to 125 bp with yield of ~95 Gbp paired-ends runs is announced for early 2010 (http://www.illumina.com/). We advise using the latest version available.

The ChIP-CHIP technique combines ChIP with microarray technology. Most commonly used are the tiling arrays that cover the complete non-repetitive genome. For Arabidopsis thaliana, two types of tiling array are available from either Affymetrix (Santa Clara, CA, USA) or Roche-Nimblegen (Madison, WI, USA). Systematic evaluation of different tiling array platforms revealed that the sample preparation is more critical for the outcome of a ChIP-CHIP experiment than the choice of the platform28.

The major differences between ChIP-SEQ and ChIP-CHIP are resolution, coverage and sensitivity (Table 1, Supplementary Fig. 1 and Supplementary Figure Legend)29. When combining both methods, ChIP-CHIP can be used as a first step for assess-ing the genome-wide binding of a TF, to get a rough estimate of

the number of binding sites and the overall level of enrichment. Then, ChIP-SEQ experiments can be carried out to characterize the binding sites in depth at better resolution.

Theoretically, if considering the basic features of the two methods, sequencing technologies can enable single-nucleotide resolution, whereas tiling arrays are limited by their degree of tiling (e.g., Affymetrix, 35 bp)29. However, the resolution also depends on the details of the experimental setup (e.g., average fragment sizes in the DNA libraries) and data analysis procedures. For this reason, it is not possible to provide general quantification of differences between ChIP-CHIP and ChIP-SEQ.

Primary ChIP-SEQ data analysisHere we focus only on the primary data analysis that starts with quality checks and ends with the determination of a set of potential binding sites characterized by well-defined binding signal statistics. We do not describe the steps that may follow and aim at struc-tural and functional characterization of the sites found, as these depend very much on the biological questions of the experiment. A summary of different data analyses steps and software packages are exemplified in Supplementary Figure 2.

Mapping of reads and quality checks. The sequence reads that are produced by the Illumina sequencing procedure have confidence scores for each base. These reads must then be aligned against the Arabidopsis genome. The standard Illumina data analysis pipeline uses the program Eland for alignment. Other software programmes have been developed for the same purpose, e.g., Maq30, SOAP31 or Bowtie32. Sequence reads usually have more sequencing errors at their 3′end, which can interfere with the mapping to the reference genome. In our analysis, we used SOAP, which can iteratively trim basepairs at the 3′end of a sequence and repeat the alignment until the sequence can be mapped or it is too short to be mapped unam-biguously to the genome. Reads that map to multiple locations in the genome are usually excluded, as they are ambiguous. Such exclusion leads to the problem that parts of the genome, including highly repetitive or recently duplicated sequences, cannot be aligned and thus cannot be assayed. Reads that cannot be mapped to the reference genome can, therefore, be a problem in ChIP-SEQ data-sets. The proportion of mapped reads uniquely located on the genome at study gives an important measure of the power of the statistical analysis, as potential binding sites will be determined

table 1 | Comparison of ChIP-SEQ and ChIP-CHIP.

chIp-seQ chIp-cHIp

Resolution High resolution; depending on the fragment sizes obtained during library generation

Lower resolution; depending on the array platform

Coverage Limited by unique alignability of reads; higher coverage possible for repetitive regions (supplementary Fig. 1); depends on the number of reads and number of regions bound29

High coverage in non-repetitive regions of the genome; repetitive regions are usually masked

Specific technical problems

Contamination with non-Arabidopsis DNA, reads with identical sequence, contamination with adapter sequences

Hybridization-related problems; efficiency of annealing depends on probe sequence, tiling is not complete (gaps)

Sensitivity (dynamic range)

Potentially greater than ChIP-CHIP Limited by hybridization-related effective range of signal-to-noise ratios

460 | VOL.5 NO.3 | 2010 | nature protocols

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

using a function regarding number of reads in the sample com-pared with the control. Reads that cannot be mapped to the genome can result mostly from (a) contamination of the sample with non-Arabidopsis or adaptor sequences or (b) low-quality reads. It is important to understand why reads cannot be mapped, in order to improve the quality of future sequencing experiments and to minimize the effect of artifacts in the analysis. Contamination of the sample can arise from DNA molecules that accidentally enter the sample during the processing steps before adaptor ligation, e.g., human DNA. ChIP-SEQ sample preparation generally requires a special working environment, e.g., a designated pre-PCR working place/sterile hood.

Another important source for sample contamination is adaptor concatamers, which remained in the sample after size selection by gel purification. The magnitude of this effect can be measured by counting the proportion of reads representing (totally or partially) the sequence of any of the adaptors used. This problem can be minimized by optimizing the ratio of sample and adaptor molecules in the ligation step of the sample preparation procedure (see Troubleshooting). The quality of the reads is also important to understand the proportion of mapped reads.

Another problem in ChIP-SEQ data are reads containing high number of consecutive adenines (poly[A] stretches). This problem is possibly due to a technical bias introduced during SOLEXA sequencing; those reads are often located on the bound-aries of the tile, in which the sequencing process takes place. The problem of poly[A] stretches seems to be less pronounced with the newer versions of kits and Genome Analyzer II.

Recent publications33,34 showed that there is a bias towards an underrepresentation of A + T-rich sequences in Solexa sequencing data (like with other Next Generation Sequencing methods). The bias is theoretically adjusted when comparing ChIP sample with the control (see below). However, it may still represent a problem in datasets or genomic regions with low enrichment in the sample compared with the control, as the Arabidopsis genome is 64% A + T-rich35. To improve the representation of A + T-rich sequences, a reduction of the temperature during the gel extraction (Step 52) has been suggested20.

The distribution of the number of reads representing the same sequence is important to detect any artifact because of PCR ampli-fication. Loss of complexity because of PCR amplification leads to a higher number of reads having the same sequence (ref. 36 and our data). This artifact can be minimized by optimizing the number of cycles in the amplification during the sample preparation pro-cedure (see Troubleshooting). We generally do not recommend any re-amplification step. The primary statistical analysis described below was designed to reduce the effects of multiple reads having the same sequence.

Measuring signal intensities and enrichment in sample versus control. Several methods for analysis of ChIP-SEQ data have been published recently (see comparison in ref. 37). The methods differ in consideration of controls and in basic statiscal approaches used to compare sample and control. Our methodology, presented here and previously14, is one out of several ways to analyze ChIP-SEQ data. We determine the enrichment for each nucleotide position in the genome separately, which potentially increases the resolution of the experiment.

After aligning the reads to the genome, we extend them directionally to recover the original average size of the DNA frag-ments that were isolated in the ChIP experiment. In an ideal ChIP-SEQ experiment, sequences originating from a truly enriched genomic region should map in a 1:1 ratio to both strands of the chromosomal DNA. However, as some sequences are represented by an artificially high number of reads because of the PCR ampli-fication (see previous section), this ratio can be distorted. In our methodology (Fig. 2), we estimate the number of hits for each nucleotide position for both strands of the DNA double helix independently (i.e., the number of extended reads covering a genomic position). Of those two numbers we take the minimum value as the score. As a result, we are demanding that the score value be supported by both strands independently.

To determine the enrichment at a given genomic site, the data of ChIP-SEQ sample and control need to be compared using appro-priate statistics. After normalization, we use the Poisson distribu-tion to estimate the enrichment (as in ref. 38) and estimate the distribution of the test statistic under the null hypothesis using permutation. This distribution can be generated by randomly per-muting the reads from the control group and sample, and there-fore destroying any relationship between the read sequence and the group, where they originally belong. The score values associated to these permutated datasets can be used to estimate the distribution of the score under the null hypothesis (of no differences between sample and control) and therefore to control false discovery rate (FDR) at the desired level.

Experimental designAntibody. The antibody can be generated against the protein of interest or against a so-called tag, to which the DNA-binding pro-tein is fused, e.g., GFP or FLAG2. We successfully used relatively easy-to-produce peptide antibodies against our proteins of interest, but ChIP of GFP-tagged TFs can also work well2. An advantage of ChIP using antibodies that are directed against the protein of inter-est is that it reduces the risk of artifacts, which may arise when dealing with tagged proteins that may differ in their protein func-tionality when compared with the endogenous protein. Another

DNA regionReadsExtended reads

Num

ber

of h

its

3′5′ 5′3′

5′ 3′3′5′ TFTF

a b

Num

ber

of h

its

Figure 2 | Estimating the minimum number of hits per nucleotide position based on SOLEXA sequence information from both DNA strands. (a) Multiple reads with the same sequence aligning to a genomic region. As the (extended) reads map only to one of the DNA strands, and there are no reads mapping to the other strands, the number of hits at any nucleotide position is zero. (b) A truly enriched genomic region is supported by multiple independent reads in both strands. The number of hits at each nucleotide position is given by the strand with the minimum number of overlapped extended reads at that position.

nature protocols | VOL.5 NO.3 | 2010 | 461

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

disadvantage of the use of tagged proteins for ChIP is that the expression lines have to be generated, preferably in the mutant background of the corresponding factor to avoid competition for binding by the endogenous protein. However, tagging increases the experimental flexibility, for instance, for the identification of tissue-specific DNA-binding sites; tagged proteins may only be expressed in a subset of tissues in which the endogenous protein is expressed when driven from a tissue-specific promoter.

Tissue. Many TFs are active in several tissues and stages of deve-lopment, and target genes may differ in a tissue-specific manner depending on the availability of cofactors. For this reason, it is important to carefully choose a tissue for ChIP experiments that is valid for the experimental question. It is also preferable to use homogenous tissue rather than a mix of different tissues or stages, as target sites that are only enriched at a particular stage/tissue may otherwise be diluted out.

If a transgenic approach is being used, we prefer to express the tagged TF of interest in its ‘natural’ expression domain and native biological context. Functionality of the tagged protein can be tested by complementation of the mutant phenotype of the gene coding for the TF.

When the level of the endogenous protein is extremely low or it is highly cell-type specific, e.g., restricted to a few cells in the meristem, it may not be possible to obtain clear enrichment in ChIP experiments. If it is not possible to increase the signal by using a more specific tissue in which the factor is expressed, e.g., using cell sorting, it can be an alternative to carry out ChIP on plants that overexpress the protein of interest outside its natural expression domain. Especially in this case (but also in general), it is crucial to confirm the binding of the protein to those target sites and their biological relevance later using independent methods, e.g., DNA-binding assays, in vivo reporter gene fusions and classical genetic approaches.

Design of controls. In order to evaluate the success of a ChIP experiment, we routinely perform qPCR on the ChIP sample using positive and negative control primer sets. Positive control primer sets can be derived from known binding sites of the protein of interest. If these are not available, candidate-binding sites can then be chosen based on other evidence, e.g., binding motifs in promoters of genes that are known to be regulated by the factor of interest, based on induction experiments followed by the analysis

of gene expression. We routinely do carry out qPCR experiments using at least two positive and two negative control primer sets to evaluate the quality of a ChIP sample. To test for enrichment, the qPCR should also be carried out on a reference sample. This can be input DNA (sonicated chromatin) or DNA that comes from a control ChIP experiment. A control ChIP experiment can either be a ChIP without specific antibody (‘mock IP’) or an IP on a mutant plant that does not express the protein of interest. As antibodies may cross react with other DNA-binding proteins than the protein of interest, we recommend considering this possibility in control design for genome-wide studies. When working with an antibody that is directed against the protein of interest, control ChIP experiments can be carried out on mutant plants that lack this protein. If an antibody against tag is being used, a control ChIP can make use of material from plants in which the tagged protein is not present. If using an inducible version of the protein (e.g., fusion to the glucocorticoid receptor), a control ChIP can be carried out with plant material using the same antibody, wherein the protein was not induced (i.e., is not present in the nucleus). In general, we recommend as a control ChIP using the same antibody on plant material in which the specific protein is not present in the nucleus. Yet the material should come from a similar type of tissue, to keep the relative levels of ‘background’ proteins comparable.

Aside from testing for enrichment, the cycle threshold (CT) values obtained with the negative control primer pairs (‘back-ground’) are also useful to easily compare relative DNA amounts across different ChIP samples. Careful primer design is absolutely crucial for the qPCR experiments. The primer efficiency should be close to 100%, as determined by template dilution series. During the sample processing steps, we include several qPCR experiments to check whether the enrichment was maintained. In principle, enrichment of positive control compared with negative control genes can be roughly estimated by comparing CT thresholds for the different primer pairs within a ChIP sample (∆CT), if the primer efficiency is nearly 100%. We still recommend parallel processing of enriched samples and control samples to reduce technical variation.

The choice of controls is also crucial in the design of genome-wide ChIP-SEQ experiments. We usually use ChIP on plant material that represents the same organ type but that does not express the protein of interest (e.g., a mutant or, in case of an antibody against a tag, wild-type plants) as a control in ChIP-SEQ experiments.

MaterIalsREAGENTS

Plant material: fresh tissue (e.g., inflorescence material or seedlings), 0.7–0.9 gCrude antiserum or purified antibodies (1–2 µg per ChIP experiment)Preferably corresponding pre-immuneserumQiaquick PCR purification kit (Qiagen, cat. no. 28104)MinElute PCR purification kit (Qiagen, cat. no. 28004)Qiaquick gel extraction kit (Qiagen, cat. no. 28706)Agarose MP (Invitrogen, cat. no. 15510–027)GelRed (Biotium, cat. no. 41003-0.5 ml)Sucrose (Duchefa, cat. no. S0809.5000)NaCl (Sigma, cat. no. 31434)MgCl

2 (VWR Int. BV, cat. no. 80503–016)

NaH2PO

4 (VWR Int. BV, cat. no. 1.063460500)

Na2HPO

4 (VWR Int. BV, cat. no. 1.06586.0500)

2-Methyl 2,4-pentanediol (VWR Int. BV, cat. no. 8.20819.1000)Glycine (Fisher Scientific, cat. no. 12007–0050)

••••••••••••••

Triton-X 100 (VWR Int. BV, cat. no. 1.08603.1000)β-Mercaptoethanol (Sigma, cat. no. M3148)Tween-20 (Sigma-Aldrich, cat. no. 27.434-8)HEPES (Sigma, cat. no. H3375)Sarkosyl (Sigma, cat. no. L5125)EDTA (VWR Int.BV, cat. no. 1.08603.1000)ZnSO

4 (VWR Int. BV, cat. no. EM-ZX0105-1)

SDS (Invitrogen, cat. no. 15525-017)Tris Base (Biomol, cat. no. 08003)Ethanol (Merck, cat. no. 1.00983.2500)Proteinase K (Roche, cat. no. 03 115 836 001)Sodium acetate (Merck, cat. no. 6268.1000)Glycogen (Invitrogen, cat. no. 10814-010)Complete protease inhibitor cocktail (Roche, cat. no. 04 693 116 001)Protein-A agarose beads (Santa Cruz, cat. no. sc-2001)Illumina adaptors and PCR primers (Illumina, cat. no. IP-102-1001 or PE-400-1001)

••••••••••••••••

462 | VOL.5 NO.3 | 2010 | nature protocols

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocolEnd-It DNA end repair kit (Epicentre, cat. no. ER0720)100-mM dNTP stock (NEB, cat. no. N0446S)Klenow enzyme (NEB, cat. no. M0212s)T4 DNA ligase with 10× buffer (Promega, cat. no. M1801) crItIcal If using the enzymes included in the Illumina ChIP-SEQ kit instead of the ones from separate suppliers as suggested here, we strongly recommend using the 10× T4-ligase buffer indicated in our protocol instead of the 2× buffer provided by Illumina, which is associated with a different timing of the ligation reaction. crItIcal If the adaptors39 are ordered separately from a company other than Illumina, make sure to include the appropri-ate modifications during oligo synthesis. The adaptors should be annealed using standard procedures before the ligation reaction.Zero blunt TOPO PCR cloning kit for sequencing (Invitrogen, cat. no. K287520) and LB medium, ampicillin and PCR, and Sanger sequencing reagentsBiotin-11-ddATP (NEN, cat. no. NEL548001EA)DNAse I (Invitrogen, cat. no. 18068-015)One-Phor-All Buffer Plus (GE Healthcare)Terminal transferase (Roche, cat. no. 03 333 566 001)

EQUIPMENTSafelock tubes 1.5 ml (Eppendorf, cat. no. 0030 120.086)Safelock tubes 2 ml (Eppendorf, cat. no. 0030 120.094)Microcentrifuge tubes low adhesion (BIOPLASTICS, 1.5ml: cat. no. 174030, 2 ml: cat. no. 174035)DNA Lo-bind tubes 0.5 ml (Eppendorf, cat. no. 022431005)Standard PCR tubesOptically clear PCR tubesStandard vacuum pump (e.g., KNF VDE0530) and Exsiccator (Kartell)Probe sonicator (e.g., MSE Soniprep 150)Mortar and pestleRotating device for tubesStandard lab centrifuges for 2ml tubes and 50 ml tubesStandard PCR machine (e.g., Eppendorf Mastercycler, Eppendorf)Realtime PCR machine (e.g., Bio-Rad iQ5, Bio-Rad)Standard Nanodrop machine (Thermoscientific)Qubit Fluorometer (Invitrogen)BioAnalyzer DNA 1000 chip (Invitrogen)ChIP-SEQ data analysis: For the computational analyses it is advised to have access to computing clusters. The exact hardware requirements depend mainly on the number of reads in the datasets and on the required precision of FDR estimation. The results published by Kaufmann et al.14 were obtained under Scientific Linux CERN SLC release 4.7 on the Dual-Core Intel Xeon 5160/Quad-Core Intel Xeon E5345/HP ProLiant BL2x220c G5 5-node cluster (reef.man.poznan.pl).An R package for the ChIP-SEQ data analysis is available upon request (J M Muino, K Kaufmann, R van Ham, G Angenent and P Krajewski; manuscript in preparation).

REAGENT SETUPSodium phosphate buffer pH 7 (1 M stock) Mix 57.7 ml 1 M disodium hydro-gen phospahte (Na

2HPO

4) and 42.3 ml 1 M sodium dihydrogen phosphate

(NaH2PO

4). Sterile filtrate and store at room temperature.

••••

••••

•••

••••••••••••••

HEPES pH 7.5 (0.5 M stock) Dissolve 5.958 g HEPES in 35 ml distilled water, adjust to pH 7.5 using 1 M NaOH, fill up with distilled water to 50 ml. Sterile filtrate and store at 8 °C.MC buffer (fresh) Add 3.423 g of sucrose, 1 ml sodium phosphate buffer pH 7 (1 M stock), 1 ml NaCl (5 M stock) together and fill with distilled water to a final volume of 100 ml. Store on ice until use.Final concentrations: 10 mM sodium phosphate, pH 7, 50 mM NaCl and 0.1 M sucrose.M1 buffer (fresh) Add 250 µl sodium phosphate buffer pH 7 (1 M stock), 500 µl NaCl (5 M stock), 3.2 ml 2-methyl 2,4-pentanediol and 17.7 µl β-mercaptoethanol together and fill with distilled water to a final volume of 25 ml. Dissolve ½ tablet of protease inhibitor cocktail in the buffer. Store on ice until use.Final concentrations: 10 mM sodium phosphate, pH 7, 0.1 M NaCl, 1 M 2-methyl 2,4-pentanediol, 10 mM β-mercaptoethanol and complete protease inhibitor cocktail.M2 buffer (fresh) Add 250 µl sodium phosphate buffer pH 7 (1 M stock), 500 µl NaCl (5 M stock), 3.2 ml 2-methyl 2,4-pentanediol, 250 µl MgCl

2

(1 M stock), 625 µl Triton-X (20% stock) and 17.7 µl β-mercaptoethanol together and fill with distilled water to a final volume of 25 ml. Dissolve ½ tablet of protease inhibitor cocktail in the buffer. Store on ice until use.Final concentrations: like M1 buffer with 10 mM MgCl

2 and 0.5% Triton X-100.

M3 buffer (fresh) Add 250 µl sodium phosphate buffer pH 7 (1 M stock), 500 µl NaCl (5 M stock) and 17.7 µl β-mercaptoethanol together and fill with distilled water to a final volume of 25 ml. Dissolve ½ tablet of protease inhibitor cocktail in the buffer. Store on ice until use.Final concentrations: like M1 buffer without 2-methyl 2,4-pentanediol.Sonic buffer (store frozen in aliquots at − 20 °C) Add 500 µl sodium phosphate buffer pH 7 (1 M stock), 1 ml NaCl (5 M stock), 250 mg Sarkosyl and 1 ml EDTA (0.5 M stock) in a total volume of 50 ml (fill up with distilled water). Sterile filtrate and make aliquots for storage. Add fresh: complete protease inhibitor cocktail (from 25× stock). Final concentrations: 10 mM sodium phosphate, pH 7, 0.1 M NaCl, 0.5% Sarkosyl, 10 mM EDTA.IP buffer (store frozen in aliquots at − 20 °C) Add 5 ml HEPES buffer pH 7.5 (0.5 M stock), 1.5 ml NaCl (5 M stock), 250 µl MgCl

2 (1 M stock), 5 µl

ZnSO4 (100 mM stock), 2.5 ml Triton-X (20% stock) and 250 µl SDS (10%

stock) in a total volume of 50 ml (fill up with distilled water). Sterile filtrate and make aliquots for storage. Final concentrations: 50 mM HEPES, pH 7.5, 150 mM NaCl, 5mM MgCl

2, 10 µM ZnSO

4, 1% Triton X-100 and 0.05% SDS.

Elution buffer (store frozen in aliquots at − 20 °C) Add 4 ml glycine (1.25 M stock), 5 ml NaCl (5 M stock), 25 µl Tween-20 and adjust to pH 2.8 (HCl) in a total volume of 50 ml (fill up with distilled water). Final concentrations: 0.1 M glycine, 0.5 M NaCl, 0.05% Tween-20, pH 2.8.Working solution Illumina adaptors Dilute the Illumina adaptors 1:10 or 1:50 with ultrapure milliQ water.Working solution dATP (1 mM dATP stock) Dilute 100 mM dATP in 10mM Tris HCl pH 8.5 or Qiagen elution buffer.Working solution Illumina primers Dilute Illumina primers 1:1 with ultrapure milliQ water.

proceDuretissue collection and fixation ● tIMInG 1.5 h1| Collect plant material, 0.8 g of plant tissue into a 50-ml tube. Keep harvested tissue cooled on ice. Collecting should take no longer than 40 min to maximum 1 h to prevent withering. Stability of the plant material is dependent on the tissue, e.g., on its size and water content.

2| Fixate (crosslink) tissue in 50 ml tube in MC buffer (25 ml) with 1% formaldehyde (676 µl of 37% stock) on ice under vacuum. Release vacuum after 15 min, mix tissue and re-apply vacuum for another 14 min (total fixation time 30 min). crItIcal step It should be made sure that the tissue is submerged in the buffer and not floating on top of the surface.

3| Fixation is stopped by adding 2.5 ml glycine (1.25 M stock), mix well by inverting the tube several times and apply vacuum for another 2 min.

nature protocols | VOL.5 NO.3 | 2010 | 463

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

4| Wash thrice with MC buffer (25 ml each wash). Later, ‘dry’ the plant tissue on tissue paper. Transfer the tissue to a new 50 ml tube and quick-freeze in liquid nitrogen. pause poInt The tissue can be stored at − 80 °C at this point.

nuclei isolation and shearing of chromatin ● tIMInG 2 h5| Grind the tissue thoroughly to complete homogeneity in a mortar with liquid nitrogen.

6| Transfer the powder to a 50 ml tube with 20 ml M1 buffer.

7| Filter slurry through a mesh (55 µm) and collect in a 50 ml tube on ice. Wash the mesh with an additional 5 ml of M1 buffer to quantitatively collect all nuclei in the filtrate.! cautIon Buffers M1 to M3 contain β-mercaptoethanol—work in the fume hood.

8| Centrifuge the filtrate at 1,000g for 20 min at 4 °C.

9| Wash the nuclear pellet five times with 5 ml of M2 buffer, centrifuge for 10 min at 4 °C at 1,000g.

10| Wash once with 5 ml of M3 buffer and centrifuge as in Step 9.

11| Resuspend the crude nuclear pellet in 1 ml Sonic buffer and transfer to a 2 ml safelock Eppendorf tube.

12| Sonicate the chromatin with a probe sonicator for three cycles of 15 s pulses of half maximal power and 45 s cooling between pulses (the tube is placed on ice the whole time; time of sonication depends on sonicator and time of fixation). Leave on ice for 3–4 min, and invert the tube several times in-between.

13| Centrifuge the suspension in a microcentrifuge at top speed for 10 min at 4 °C.

14| Transfer the supernatant to a new 2 ml safelock tube and spin again for 8 min at top speed at 4 °C. Transfer the supernatant into a new tube.

preclearing ● tIMInG 2 h15| Transfer the supernatant into a new 2 ml tube containing an equal volume of IP buffer. Set aside 120 µl to serve as ‘input DNA’ control and to check the degree of sonication by loading the sample on a 1.5 % (wt/vol) agarose gel poured with 1/10,000 (vol/vol) GelRed after reverse crosslinking and purification (Step 29 onwards) (see Anticipated Results).

16| To preclear chromatin, add 5 µl pre-immune serum and incubate 50 min at 4 °C on a rotating wheel. If pre-immuneserum is not available, we recommend two-rounds of centrifugation (Step 17), followed by a prolonged incubation with protein-A beads (Step 19: 1 h 30 min instead of 1 h).

17| Centrifuge at 16,000g for 10 min at 4 °C to get rid of debris and transfer the supernatant into a new 2 ml safelock Eppendorf tube.

18| Prepare the protein-A agarose beads (25% slurry) for Steps 19 and 24 by making an aliquot of the required amount ( + 100 µl extra) into a new 1.5 ml low-adhesion tube. Wash the beads by adding 1 ml IP buffer and inverting the tube several times. Spin at 3,800g for 3 min and discard 1 ml of the supernatant. Repeat the wash step by adding again 1 ml of IP buffer, mixing and centrifugation. The final volume of beads should be the same as the original volume.

19| Add 80 µl protein-A agarose beads from Step 18 (25% slurry) to the chromatin from Step 17 using a cut pipette tip and incubate for 1 h at 4 °C on a rotating wheel.

20| Centrifuge in a microcentrifuge for 5 min at 3,800g at 4 °C. Transfer the supernatant into a new 2 ml safelock tube and spin for 10 min at maximum speed, 4 °C.

Immunoprecipitation ● tIMInG 3.5 h21| Divide the supernatant equally between two 2 ml Eppendorf safe-lock tubes, one that serves as the IP-sample and the other as negative control. If ChIP on mutant plant tissue is used as negative control instead of a non-specific serum control, both samples can be used for IP (‘technical’ replicates of the IP).

464 | VOL.5 NO.3 | 2010 | nature protocols

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

22| Add to the IP-sample 3 µl antiserum or X µl purified antibody (depending on concentration antibody, ~1–2 µg) and to the optional negative control 3 µl pre-immune serum. Incubate for 1 h at 4 °C on a rotating wheel.

23| Centrifuge at 16,000g for 5 min at 4 °C to get rid of debris and transfer the supernatant to a new 2 ml safelock Eppendorf tube.

24| Add 40 µl protein-A beads (25% slurry) to each tube using a wide-bore pipette tip and incubate for 50 min at 4 °C on a rotating wheel.

25| Centrifuge for 5 min at 3,800g at 4 °C. Transfer supernatant to a new tube without disturbing the pellet (beads). Discard the supernatant and continue with the beads.

26| Wash the beads five times with 1 ml IP buffer for 8 min on rotating wheel at room temperature. After adding the IP buffer for the first time, transfer the beads + buffer to a 1.5 ml low-binding tube using wide-bore tips. Recover the beads between the washes by centrifugating for 2 min at 3,800g at room temperature. crItIcal step After each centrifugation step, the IP buffer should be removed nearly completely from the beads without disintegrating the bead pellet. Work in a clean environment (pre-PCR or sterile hood) to reduce contamination.

27| Elute the protein–DNA complexes from the beads by adding 100 µl cold elution buffer, incubate for 1 min at 37 °C while shaking rigorously, and centrifuge for 1 min at maximum speed. Transfer supernatant (eluate) to 1.5 ml safelock tube. Add 50 µl of 1 M Tris pH 9 to neutralize. Repeat this step twice more, with the last elution step incubate for 4 min at 37 °C (final combined volume of eluate will become 450 µl).

28| Spin the eluate at top speed for 2 min. Transfer eluate (protein–DNA complexes) to new 2 ml safelock tube without disintegrating any pellet that may have been formed (residual beads).

reverse crosslinking/Dna purification ● tIMInG 40–50 h including two overnight incubation steps29| Add proteinase K (to final concentration of 0.5 mg ml − 1; 11.25 µl of 20 mg ml − 1 stock) to the eluate and incubate overnight at 37 °C. Continue from Step 29 also with ‘input DNA’ sample (from Step 15; add 330 µl TE to reach the same volume as the other samples).

30| Add the second aliquot of proteinase K (same amount as in Step 29) and incubate at 65 °C for at least 6 h to reverse crosslinking.

31| Precipitate DNA with 2.5 vol 100 % ethanol, 1/10 vol of 3 M NaAc pH 5.4 and 1 µl glycogen overnight at − 20 °C.

32| Centrifuge for 30 min at maximum speed at 4 °C.

33| Resuspend DNA in 100 µl milliQ water.

34| Purify DNA using Qiagen PCR purification columns, elute in 30–35 µl of elution buffer. Elute the DNA into low-adhesion tubes. pause poInt Purified ChIP sample can be stored at − 20 °C for up to 4 months.

carry out qpcr to test for enrichment for positive and negative control primers35| Dilute part of the DNA sample 1:5 in milliQ water (per reaction: 1 µl of original sample to 5 µl final volume).

36| Prepare a master mix of forward and reverse primers at a final concentration of 1 µM per primer.

37| Combine the following in an optically clear PCR tube:

Primer master mix 5 µl

milliQ water 2.5 µl

Diluted DNA sample 5 µl

iQ SYBR Green Supermix 12.5 µl

38| Setup and run the following PCR program (Bio-Rad iQ5) (see Anticipated Results): incubate for 3 min at 95 °C, followed by a total of 40 cycles of (15 s at 95 °C, 1 min at 60 °C).? troublesHootInG

nature protocols | VOL.5 NO.3 | 2010 | 465

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

chIp-seQ sample preparation ● tIMInG 3 d including two overnight steps39| end repair: Combine and mix the following components in a low-adhesion tube, and incubate at room temperature (18–20 °C) for 45 min.

ChIP DNA (usually one complete chromatin immunoprecipitation (ChIP) sample)

28 µl

10× End-repair buffer 5 µl

2.5 mM dNTP mix 5 µl

10 mM ATP 5 µl

Sterile water to bring reaction volume to 49 µl X µl

End-repair enzyme mix 1 µl

Total reaction volume 50 µl

40| Purify DNA from Step 39 on a QIAquick column from the QIAquick PCR purification kit and protocol. Elute in 34 µl of elution buffer. crItIcal step Use low-adhesion tubes to collect the eluate at all purification steps (Steps 40, 42, 44, 48).

41| a-tailing: Combine and mix the following components in a PCR tube and incubate for 30 min at 37 °C (e.g., in PCR machine).

DNA from Step 40 34 µl

Klenow buffer = NEB buffer 2 5 µl

1 mM dATP 10 µl

Klenow fragment (3′–5′ exonuclease minus) 1 µl

Total reaction volume 50 µl

42| Purify DNA from Step 41 on a QIAquick MinElute column using the MinElute PCR purification kit. Elute in 17 µl of elution buffer.

43| adaptor ligation: Combine and mix the following components in a Lo-bind tube and incubate for 20–22 h at 16 °C (e.g., in PCR machine).

DNA from Step 42 16.5 µl

10× DNA ligase buffer 2 µl

Illumina adaptor oligo mix (1:10–1:50) 1 µl

T4 DNA ligase 0.5 µl

Total reaction volume 20 µl

crItIcal step Use diluted adaptors. Optimal concentration of adaptors may need to be determined empirically. Long incubation time is essential for efficient adaptor ligation.

44| Purify using the Qiaquick PCR Purification Kit and protocol. Elute in 30 µl elution buffer.

45| pcr (steps 45–47): Set up the master mix and combine in a PCR tube. This is the master mix per reaction (always prepare for one reaction more than you need):

5× Phusion buffer 10 µl

2.5 mM dNTP 4 µl

Phusion enzyme 0.8 µl

dH2O 3.2 µl

Total reaction volume 18 µl

466 | VOL.5 NO.3 | 2010 | nature protocols

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

46| Then, combine in a PCR tube:

DNA from Step 44 30 µl

Master mix (see above) 18 µl

PCR primer 1.1 1 µl

PCR primer 2.1 1 µl

Total reaction volume 50 µl

47| Amplify using the following PCR protocol: (a) 30 s at 98 °C; (b) (10 s at 98 °C, 30 s at 65 °C and 30 s at 72 °C) 18 or 20 cycles total; (c) 5 min at 72 °C; (d) hold at 4 °C. crItIcal step Optimal number of cycles depends on the amount of DNA in the original ChIP sample and can be varied between 16 and 20 cycles. The optimal number of cycles should give the best enrichment and yield a sufficient amount of DNA.

48| Purify on one QIAquick MinElute column using the MinElute PCR purification kit and protocol. Elute in 20 µl EB.

49| Test for enrichment using qPCR with the same conditions as in Steps 35–38. Use 0.25 µl of DNA sample per reaction. (see Anticipated Results)? troublesHootInG

50| Gel purification (steps 50–52): Run PCR product (from Step 48) on a 2% agarose gel poured with 1/10,000 (vol/vol) GelRed.? troublesHootInG

51| Excise a large band in the range 250–500 bp with a clean scalpel. Be sure to take photos of the gel both before and after the slice is excised. Estimate and record the median product size. crItIcal step If multiple libraries are being prepared, then care must be taken to avoid cross-contamination by using one gel per sample. crItIcal step The resolution of the experiment can be enhanced by reducing the average fragment size of the libraries. If the amount of DNA is high enough, a smaller range of fragments (e.g., 250–400 bp) can be excised. Use the same range of fragments for sample and control IP.

52| Purify the DNA from the agarose slice using a QIAGEN Gel extraction kit. Elute in 30 µl of elution buffer. crItIcal step In case of problems with strong under-representation of A + T rich sequences, carry out the solubilization of gel slices in chaotrophic buffer at room temperature instead of 50 °C (ref. 20). (see Anticipated Results)

53| Quality control (steps 53–55): Test for enrichment using qPCR with the same conditions as in Steps 35–38.

54| Measure the DNA concentration and size by picogreen assay (Invitrogen Q-bit) and Agilent BioAnalyzer DNA 1000 chip.

55| Use 2 µl of DNA from Step 52 for standard ligation into a Topo vector. Transform into Escherichia coli (e.g., TOP10) and sequence 10–20 clones by Sanger sequencing. Of the cloned inserts, more than 30% should correspond to Arabidopsis genomic DNA.? troublesHootInG

56| Dilute library to 10 nM. Prepare flow-cell using 4 pM final concentration following the Illumina protocol. pause poInt The 10 nM library can be stored at − 20 °C for several months (see Anticipated Results).

partial digest and labelling of amplified Dna for hybridization to affymetrix tiling arrays (chIp-cHIp) ● tIMInG 3 h57| partial digest (fragmentation)(steps 57–59): Combine the following components in a PCR tube:

One-Phor-All 10× Buffer 3 µl

CoCl2 1.5 µl

DNA from Step 48 (or 52) 0.5–2 µg

Nuclease-free water X µl

Total volume 30 µl

! cautIon CoCl2 is toxic. Wear protective gloves.

nature protocols | VOL.5 NO.3 | 2010 | 467

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

58| Incubate for 2 min at 22 °C in a PCR machine, then stop the reaction immediately by heating at 80 °C for 15 min. crItIcal step To avoid overdigestion, test different incubation times on input DNA before working with the ChIP sample to determine the optimal incubation time (10 s to 4 min).

59| Load 2 µl on a 2% agarose gel poured with 1/10,000 (vol/vol) GelRed. A faint smear should be visible with the majority of DNA < 100 bp.? troublesHootInG

60| end-labelling with biotin (steps 60–61): Combine the following components in a PCR tube:

Nuclease-free water X µl

CoCl2 3.5 µl

5× TdT buffer 10 µl

Biotin–ddATP 2.0 µl

DNA from Step 58 28 µl

Terminale transferase 0.1 µl

Total volume 50 µl

61| Incubate at 37 °C for 1 h. Then, incubate for 10 min at 95 °C. The sample is now ready for hybridization to Affymetrix tiling arrays. pause poInt The labelled DNA can be stored at − 20 °C.

chIp-seQ data analysis I: mapping of reads, read extension and normalization62| Locate unique positions of the sequenced reads depending of the DNA strand in the Arabidopsis genome (use the latest TAIR version; ftp://ftp.arabidopsis.org/home/tair/Genes/), allowing two to three mismatches. Reads that map to several positions should be discarded. Several programs are available for this30,31 (see supplementary Fig. 1). SOAP31 has the advantage that non-mapped sequenced reads are iteratively trimmed at 3′ end until they are uniquely mapped or a minimum read size is reached, in which case they are discarded. In this way, a high number of reads are successfully mapped (see Anticipated Results).

63| Extend directionally to n bp the location of mapped reads depending of the strand; n should be the average size of the DNA fragments obtained by gel purification (Step 51).

64| Count the number of extended mapped reads that cover each nucleotide position of the Arabidopsis genome depending of the strand. For each nucleotide position i, take the score xi as the minimum of counts for both strands.

65| Normalize xi depending on the total genome-wide score using the transformation

yx a

xi

i

ii

∑ ,

where a is an arbitrary number (e.g., 300 ( = fragment length) ×106) to adjust to a convenient scale.

chIp-seQ data analysis II: comparison of sample and control66| Obtain scores yi for the sample (yi,1) and for the control (yi,2). Normalize the distribution of yi,1 values to have the same mean and variance as yi,2 values.

67| For each nucleotide position, use the test based on the Poisson distribution

tz

ki

zik

k

yii

= −−

=

∑121

2

0

1 e!

,

where zi,2 is the maximum between the score yi,2 and the coverage of the control. This coverage is obtained multiplying the number of extended reads by their length and divided by the mapable genome length.

68| Define candidate DNA-binding sites as continuous genomic regions with ti values lower than 0.05. Candidate DNA-binding sites that are not separated by more than 100 bp are considered as single sites.

468 | VOL.5 NO.3 | 2010 | nature protocols

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

69| For each candidate binding site, calculate its score mi (named ‘peak score’) as the maximum of ti values (see Anticipated Results).

chIp-seQ data analysis III: calculating significance thresholds70| Permute the unique read locations from the sample and control.

71| Repeat Steps 63–69 to obtain the distribution of a new set of mi values generated after permutation.

72| Repeat Steps 70–71 until a high number (eg. 50,000) of generated mi values are obtained.

73| Obtain the critical score for the test statistics from the distribution of the generated mi values to control FDR at the desired level (see Anticipated Results).

● tIMInGchIp (3 d + )Steps 1–4, Tissue collection and fixation: 1.5 hSteps 5–14, Nuclei isolation and shearing of chromatin: 2 hSteps 15–20, Preclearing: 2 hSteps 21–28, Immunoprecipitation: 3.5 hSteps 29–34, Reverse crosslinking/DNA purification: ~40–50 h

chIp-seQ sample preparation (3 d + )Steps 39–43, End repair/A-tailing/ligation setup: 2–2.5 h + overnight stepSteps 44–52, Amplification/Gel purification: 4 hSteps 53-55, Quality control: 2 d including overnight stepSteps 57–61, Partial digest and labeling of amplified DNA for hybridization to Affymetrix tiling arrays (ChIP-CHIP): 3 h

? troublesHootInGTroubleshooting advice can be found in table 2.

table 2 | Troubleshooting table.

step problem possible reason solution

38 Low enrichment in the ChIP-DNA compared with control (low signal-to-noise ratio)

Wrong choice of positive control target site

Test for enrichment on multiple known or candidate DNA target sites, choose two or more positive control regions from those, test multiple primer pairs for each positive control gene to identify the one that covers the true binding site

qPCR primer design Use primer software to make primers, e.g. Primer3. Calculate primer efficiency from qPCR template dilution series

Insufficient removal of insoluble material or washing

Do not transfer any insoluble material to new tubes after centrifugation in Steps 13, 14, 20, 23 and 28. If necessary, repeat the centrifugation. Remove as much as possible of the IP buffer in the washes (Step 26)

Antibody does not work well Test antibody in western blot analysis and/or immuno-precipitation of the protein of interest. The protein of interest should be detectable in western blots. The anti-body should be able to immunoprecipitate the protein of interest. If this fails, produce another antibody against your protein or use another tag

Low amount of protein expression in the harvested tissue

Check the presence of mRNA and protein using stand-ard molecular biological methods. Use different tissue or more specific parts of the tissue to increase protein content

(continued)

nature protocols | VOL.5 NO.3 | 2010 | 469

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

antIcIpateD resultssonication (step 15)An efficient sonication is crucial for a good enrichment and resolution in ChIP experiments. Using the appropriate settings for sonication, the chromatin fragments should have sizes between 0.2 and 1.2 kb on a 1.5 % agarose gel.

evaluation of enrichment by qpcr (steps 35–38, 49 and 53)

The quality of the ChIP sample should be evaluated at different steps during the sample processing procedure using real-time qPCR. The CT value, which is the cycle of the PCR program at which sufficient specific PCR product has been generated to de-tect a fluorescent signal, is inversely proportional to the original relative amount of the DNA molecule of interest. Thus, qPCR can be used to identify relative amounts and enrichment of specific DNA molecules or genomic regions even in low-abundant samples. After the chromatin IP (Step 35), the amount of DNA is usually very low ( < 10 ng). For this reason, the range of CT values obtained by qPCR usually lies between 27 and 33 in our qPCR setup. Higher CT values than those or a complete failure of the amplificiation would indicate that the amount of DNA is probably too low for ChIP-SEQ sample preparation. We strongly recommend using at least two positive and two negative control (reference) primer pairs, which should amplify genomic regions that are known to be bound or not bound by the TF of interest, respectively. qPCR using these primers is standardly carried out on a ChIP sample and a negative control DNA sample (e.g., ChIP on a mutant of the TF of interest, ‘mock’ IP or input-DNA). The qPCR results of the negative control DNA sample should be used to normalize differences in CT values obtained for positive and negative control primer pairs (∆∆CT).

An 8–16-fold enrichment of positive over negative controls is generally an indication of a high-quality ChIP sample. Using primers with nearly 100% efficiency, this enrichment should be clear even within the ChIP sample (∆CT; i.e., without need to compare with the negative control sample, e.g., mutant of blind IP).

table 2 | Troubleshooting table (continued).

step problem possible reason solution

Fixation time too long or too short

Test different fixation times for the tissue used in the ChIP (10–45 min). Use the fixation time that gives the most (specific) enrichment compared with a mock-IP

Loss of protein during nuclei preparation

Check the presence of the protein of interest in nuclei extracts versus total extracts. If the protein was lost during the nuclei isolation procedure, try alternative protocols for this step, e.g., in refs. 23,40

Sonication conditions not appropriate

Test different numbers of cycles in sonication step (Step 12), e.g., vary from two to five cycles. Use the minimum fixation time that results in the majority of fragments between 0.2 and 1.2 kb

49 and 50 No or very low amount of PCR product in ChIP-SEQ sample preparation

Sub-optimal reaction conditions for pre-amplification sample preparation steps (ligation and so on)

Contamination of non-Arabidopsis sequences in amplified ChIP-SEQ DNA sample

Contaminated working environment/equipment

Work in a clean pre-PCR environment at Steps 26–47 (flow cabinet). Use Nitril gloves and avoid touching the inside of the lid of the tubes at all steps

55 High contamination of adapter sequences in the ChIP-SEQ DNA sample, as judged from the shotgun sequences or the SOLEXA sequencing run

Too high concentration of adaptors used for the ligation

Use higher dilutions of adapter sequences. Test different concentrations of adaptor (Step 43)

Insufficient separation of adap-tor concatamers from ChIP-DNA in agarose gel

Repeat gel purification step (Steps 50–52)

59 Loss of DNA after partial digest in ChIP-CHIP sample preparation

Incubation time for partial digest not appropriate for the given concentration of ChIP-DNA

Use shorter incubation time at Step 58. You can test several times, e.g., using input DNA in a similar concentration to the ChIP-DNA before working with the ChIP sample itself (e.g., 10 sec, 30 sec and 1 min)

470 | VOL.5 NO.3 | 2010 | nature protocols

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

Suboptimal preparation of the sample before amplification, during the creation of blunt ends and adaptor ligation, can lead to poor amplification associated with the loss of enrichment. For this reason, it is crucial to test for enrichment after amplification (Step 49). In general, the enrichment of positive control genomic regions compared with regions that are not bound by the TF should be similar compared with before amplification. In our hands, the CT values of the amplified sample are usually between 17 and 23 after amplification; thus, there is an increase in the amount of DNA compared with before the amplification as expected.

A last qPCR check should be carried out after gel purification (Step 53). Here the result should be similar to the one after amplification (Step 49). As a rule of thumb, the cycle thresholds may increase by one to three cycles, as we select a subset of a material that was generated by amplification. There should be no major loss in enrichment, unless the primer pairs were not designed around the centre of the binding site.

Ladder 4

0 50 100 150 200

0

0.01

0.02

0.03

0.04

0.05

0.06

Score distribution:

Den

sity

Under permutationOn the sample

a

c d

eScore

Num

ber

of p

eaks

0 2 4 6 8

0

15,000

25,000

Score0 2 4 6 8

0

6,000

14,000Score

0 2 4 6 8

0

4,000

8,000

Library 1

Library 3

Library 4

Score

b

13 11 17 23 29 350

0.2

0.4

0.6

7Position

200 400 600 8000

200

600

1,000

x

y

f

Err

or p

roba

bilit

y

10 15

0

0.2

0.8

Log2 (N reads same seq.)

Pro

port

ion

0 5

AfterBefore

Ladder3

500400300200

70

Figure 3 | Anticipated results of different steps in the ChIP-SEQ procedure. (a) Examples of pictures of amplified chromatin immunoprecipitation (ChIP) libraries, which were loaded onto a gel before purification. The purpose of gel purification is to select a range of sizes from 250–500 bp. The strong band of ~100 bp length corresponds to adaptor-only DNA fragments. The region that should be purified (250–500 bp) is indicated next to the gel pictures. Library 3 was prepared using a 1:10 dilution of adaptor in the ligation reaction, as advised in our protocol. Library 4 was prepared using undiluted adaptors. (b) Distributions of loge peak scores (Step 69) of three ChIP-SEQ experiments. The score distribution for library 3 is biased towards lower scores, due to lower enrichment in the original sample, although the number of mapped reads that was obtained is higher than for library 1 and 4 (table 3). (c) Error probabilities for all nucleotide positions in the 35 bp reads in a typical SOLEXA sequencing run (GAI). The error rate increases towards the 3′ end of the reads. (d) Position of poly[A] reads in the tile of a typical SOLEXA sequence run (GAI). The reads cluster at the left side of the tile, possibly owing to alignment problems of the hardware. (e) Distributions of score values derived from the original ChIP data and under permutation. (f) Example for a library that was processed using the standard Illumina protocol. As the DNA concentration after one round of amplification was too low (~300,000 reads from one run could be mapped to the genome), it had to be re-amplified. In the figure, reads representing the same sequence in SOLEXA runs before and after reamplification (see labels) are presented. The number of reads with the same sequence is elevated for the sample after reamplification, leading to a strong artefactual bias in the identification of enriched genomic regions (not shown).

nature protocols | VOL.5 NO.3 | 2010 | 471

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

Gel purification (steps 50–52)SOLEXA sequencing requires that the library DNA-inserts are not larger than ~500 bp. Also, the DNA fragments need to be separated from ligation by-products consisting only of adaptor dimers, which have a size of ~95 bp (single end sequencing adaptor) or 125 bp (paired-end adaptor). For these reasons, the amplified library needs to be gel-purified. A typical picture of gel runs is given in Figure 3a. Normally, the adaptor-only DNA fragments should give a single-sized band (Fig. 3a, library 3). If the adaptor concentration in the ligation reaction is too high (e.g., when undiluted adaptor was used), adaptor concatamers of multiple sizes can be formed which will strongly reduce the number of mapable DNA fragments in the SOLEXA sequence data (Fig. 3a,b library 4 in table 1).

Mapping of reads to the Arabidopsis genome (step 62)Each SOLEXA sequencing experiments generates several millions of reads, which are subsequently mapped to the Arabidopsis genome. The fraction of mapped reads depends on data quality, the frequency of adaptor-only reads and contamination with non-Arabidopsis DNA. The error rate normally increases towards the 3′ end of the sequences (Fig. 3c). A good library should produce more than 30% uniquely mapable reads. table 3 gives the results of data quality checks of several individual rounds of SOLEXA sequencing for four ChIP-SEQ libraries of different quality.

As can be seen from the table 3, there is a clear variation in the number of mapable reads depending on the quality of the library. In particular, library 4 has a high number of reads that contain adaptor sequences, due to a higher concentra-tion of adaptors that was used in the ligation step (thus changing the ratio of ChIP-DNA and adaptor). It is the same library as shown for gel purification (lane ‘4’, Fig. 3a). Other quality parameters are the fraction of reads that contain two or more bases that could not be labeled as A, T, C or G (also called QC reads) and the number of poly[A] sequences. Both of these parameters depend mostly not on the sample preparation, but on the SOLEXA sequencing experiment itself. Poly[A] sequences are artifacts that are introduced particularly in the borders of the tile (Fig. 3d) and are more frequently found in sequence data produced by the Genome Analyzer I (table 3).

The distribution of the number of reads representing the same sequences is important to detect any artifact owing to PCR amplification. High cycle numbers in the PCR will create a strong bias for only a small fraction of the original DNA fragments presented in the original library. A bias can be identified by estimating the number of reads representing the same sequence (Fig. 3f). In the example shown in Figure 3f, re-amplification leads to a shift in the distribution indicating a higher number of reads having the same sequence, and thus suggesting loss of complexity because of over-amplification. This can lead to strong distortions in the results. This artifact should be minimized by optimizing the number of cycles in the amplification during the sample preparation procedure (see Troubleshooting) and avoiding re-amplification.

comparison of sample and control and calculating significance thresholds (steps 66–73)For a successful ChIP-SEQ experiment, good initial enrichment of regions bound by the TF of interest is crucial. Figure 3b shows the distributions of loge peak scores (Step 69) of three independent ChIP-SEQ experiments. Library 1 and library 4 showed a similar enrichment in the control qPCR experiments (8–16-fold enrichment for two positive control genes), whereas library 3 had a clearly lower level of enrichment (~2–4-fold enrichment). However for library 4, only a small number of reads could be obtained by SOLEXA sequencing, whereas library 3 had the highest number of mapped reads (table 3). From Figure 3b, it can be seen that the higher number of sequences in library 3 cannot compensate for the initially low level of enrichment in the sample. In library 4, at least strongly enriched regions can be found with strong statistical support despite the low number of uniquely mapped sequences. Library 1 has a relatively high number of mapped sequences and good enrichment, and thus represents a good example for a successful ChIP-SEQ experiment.

table 3 | Examples of sequence reads obtained in typical ChIP-SEQ SOLEXA-sequencing experiments (each dataset was obtained from a single lane in an eight-lane flow-cell).

library 1 (GaI) library 2 (GaI) library 3 (GaII) library 4 (GaII)

Number of reads 4,334,840 4,105,326 8,367,681 8,638,004

Number of [A]×20 reads 14,074 (0.3 %) 47,906 (1.2 %) 915 (0.0001%) 840 (0.0001%)

Number of adapter readsa 273,576 (6.3%) 13,582 (0.3 %) 1,389,757 (8.3 %) 4,796,178 (27.8%)

Number of QC reads 377,836 (8.7 %) 265,027 (6.4 %) 2,277,630 (27.2 %) 2,821,913 (32.7%)

Number of uniquely mapped reads

1,649,875 (38%) 2,084,386 (51%) 3,898,721 (46.6 %) 736,082 (8.5 %)

aEstimated as reads mapping to the partial adapter sequence TCGTATGCCGTCTTCTGCTT.

472 | VOL.5 NO.3 | 2010 | nature protocols

p

uor

G g

n ih si l

bu

P eru ta

N 010 2©

nat

ure

pro

toco

ls/

moc. e r

ut an .

ww

w / /:pt t

h

protocol

After calculating scores for enrichment in sample versus control, the null distribution of the test statistics is determined using permutation of sample and control (Fig. 3e). From this, the error of the statistics can be controlled, for instance using FDR (Steps 70–73).

Note: Supplementary information is available via the HTML version of this article.

acknoWleDGMents We thank S. de Folter and F. Turck for their initial advice on the ChIP experiments and W. Busch for advice on sample processing. Some parts of the computations were done at the Poznań Supercomputing and Networking Center.

autHor contrIbutIons K.K. designed and carried out experiments and wrote the paper; J.M.M. analyzed data, developed computational tools and edited the manuscript; M.O. designed and carried out experiments; L.F. supervised experiments; P.K. supervised the data analysis and development of computational tools; and G.C.A. supervised the experiments and analysis and edited the manuscript.

Published online at http://www.natureprotocols.com/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/.

1. Kuo, M.H. & Allis, C.D. In vivo cross-linking and immunoprecipitation for studying dynamic protein:DNA associations in a chromatin environment. Methods 19, 425–433 (1999).

2. de Folter, S., Urbanus, S.L., van Zuijlen, L.G., Kaufmann, K. & Angenent, G.C. Tagging of MADS domain proteins for chromatin immunoprecipitation. BMC Plant Biol. 7, 47 (2007).

3. Margueron, R., Trojer, P. & Reinberg, D. The key to development: interpreting the histone code? Curr. Opin. Genet. Dev. 15, 163–176 (2005).

4. Pfluger, J. & Wagner, D. Histone modifications and dynamic regulation of genome accessibility in plants. Curr. Opin. Plant Biol. 10, 645–652 (2007).

5. Tariq, M. & Paszkowski, J. DNA and histone methylation in plants. Trends Genet. 20, 244–251 (2004).

6. Kim, T.H. & Ren, B. Genome-wide analysis of protein-DNA interactions. Annu. Rev. Genomics Hum. Genet. 7, 81–102 (2006).

7. Massie, C.E. & Mills, I.G. Chromatin immunoprecipitation (ChIP) methodology and readouts. Methods Mol. Biol. 505, 123–137 (2009).

8. Benhamed, M. et al. Genome-scale Arabidopsis promoter array identifies targets of the histone acetyltransferase GCN5. Plant J. 56, 493–504 (2008).

9. Bernatavichute, Y.V., Zhang, X., Cokus, S., Pellegrini, M. & Jacobsen, S.E. Genome-wide association of histone H3 lysine nine methylation with CHG DNA methylation in Arabidopsis thaliana. PLoS ONE 3, e3156 (2008).

10. Turck, F. et al. Arabidopsis TFL2/LHP1 specifically associates with genes marked by trimethylation of histone H3 lysine 27. PLoS Genet. 3, e86 (2007).

11. Lee, J. et al. Analysis of transcription factor HY5 genomic binding sites revealed its hierarchical role in light regulation of development. Plant Cell 19, 731–749 (2007).

12. Oh, E. et al. Genome-wide analysis of genes targeted by PHYTOCHROME INTERACTING FACTOR 3-LIKE5 during seed germination in Arabidopsis. Plant Cell 21, 403–419 (2009).

13. Morohashi, K. & Grotewold, E. A systems approach reveals regulatory circuitry for Arabidopsis trichome initiation by the GL3 and GL1 selectors. PLoS Genet. 5, e1000396 (2009).

14. Kaufmann, K. et al. Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biol. 7, e1000090 (2009).

15. Gomez-Mena, C., de Folter, S., Costa, M.M., Angenent, G.C. & Sablowski, R. Transcriptional program controlled by the floral homeotic gene AGAMOUS during early organogenesis. Development 132, 429–438 (2005).

16. Ito, T., Takahashi, N., Shimura, Y. & Okada, K. A serine/threonine protein kinase gene isolated by an in vivo binding procedure using the Arabidopsis floral homeotic gene product, AGAMOUS. Plant Cell Physiol. 38, 248–258 (1997).

17. Leibfried, A. et al. WUSCHEL controls meristem function by direct regulation of cytokinin-inducible response regulators. Nature 438, 1172–1175 (2005).

18. Wang, H., Tang, W., Zhu, C. & Perry, S.E. A chromatin immunoprecipitation (ChIP) approach to isolate genes regulated by AGL15, a MADS domain protein that preferentially accumulates in embryos. Plant J. 32, 831–843 (2002).

19. Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27, 66–75 (2009).

20. Quail, M.A. et al. A large genome center′s improvements to the Illumina sequencing system. Nat. Methods 5, 1005–1010 (2008).

21. Nowak, D.E., Tian, B. & Brasier, A.R. Two-step cross-linking method for identification of NF-kappaB gene network by chromatin immunoprecipitation. Biotechniques 39, 715–725 (2005).

22. Zeng, P.Y., Vakoc, C.R., Chen, Z.C., Blobel, G.A. & Berger, S.L. In vivo dual cross-linking for identification of indirect DNA-associated proteins by chromatin immunoprecipitation. Biotechniques 41, 694,696, 698 (2006).

23. Turck, F., Zhou, A. & Somssich, I.E. Stimulus-dependent, promoter-specific binding of transcription factor WRKY1 to Its native promoter and the defense-related gene PcPR1-1 in Parsley. Plant Cell 16, 2573–2585 (2004).

24. Bowler, C. et al. Chromatin techniques for plant cells. Plant J. 39, 776–789 (2004).

25. Xie, Z. & Grotewold, E. Serial ChIP as a tool to investigate the co-localization or exclusion of proteins on plant genes. Plant Methods 4, 25 (2008).

26. Schmidt, D., Stark, R., Wilson, M.D., Brown, G.D. & Odom, D.T. Genome-scale validation of deep-sequencing libraries. PLoS ONE 3, e3713 (2008).

27. Katou, Y., Kaneshiro, K., Aburatani, H. & Shirahige, K. Genomic approach for the understanding of dynamic aspect of chromosome behavior. Methods Enzymol 409, 389–410 (2006).

28. Johnson, D.S. et al. Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res. 18, 393–403 (2008).

29. Park, P.J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669–680 (2009).

30. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

31. Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).

32. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

33. Hillier, L.W. et al. Whole-genome sequencing and variant discovery in C. elegans. Nat. Methods 5, 183–188 (2008).

34. Dohm, J.C., Lottaz, C., Borodina, T. & Himmelbauer, H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36, e105 (2008).

35. Arabidopis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).

36. Kozarewa, I. et al. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat. Methods 6, 291–295 (2009).

37. Spyrou, C., Stark, R., Lynch, A.G. & Tavare, S. BayesPeak: Bayesian analysis of ChIP-seq data. BMC Bioinformatics 10, 299 (2009).

38. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

39. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

40. Ni, Z. et al. Altered circadian rhythms regulate growth vigour in hybrids and allopolyploids. Nature 457, 327–331 (2009).