13
Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis 1[OPEN] Daphne Ezer, a Samuel J.K. Shepherd, a Anna Brestovitsky, a Patrick Dickinson, a Sandra Cortijo, a Varodom Charoensawan, a,b Mathew S. Box, a Surojit Biswas, a,2 Katja E. Jaeger, a and Philip A. Wigge a,c,3 a Sainsbury Laboratory, University of Cambridge, Cambridge CB2 1LR, United Kingdom b Department of Biochemistry, Faculty of Science, and Integrative Computational BioScience Center, Mahidol University, Bangkok 10400, Thailand c Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom ORCID IDs: 0000-0002-1685-6909 (D.E.); 0000-0003-3291-6729 (S.C.); 0000-0002-2199-4126 (V.C.); 0000-0001-7995-5384 (M.S.B.); 0000-0002-4153-7328 (K.E.J.); 0000-0003-4822-361X (P.A.W.). Plants have signicantly more transcription factor (TF) families than animals and fungi, and plant TF families tend to contain more genes; these expansions are linked to adaptation to environmental stressors. Many TF family members bind to similar or identical sequence motifs, such as G-boxes (CACGTG), so it is difcult to predict regulatory relationships. We determined that the anking sequences near G-boxes help determine in vitro specicity but that this is insufcient to predict the transcription pattern of genes near G-boxes. Therefore, we constructed a gene regulatory network that identies the set of bZIPs and bHLHs that are most predictive of the expression of genes downstream of perfect G-boxes. This network accurately predicts transcriptional patterns and reconstructs known regulatory subnetworks. Finally, we present Ara-BOX-cis (araboxcis.org), a Web site that provides interactive visualizations of the G-box regulatory network, a useful resource for generating predictions for gene regulatory relations. Many transcription factors (TFs) are part of large families, with many members binding to highly over- lapping sets of binding sites. Therefore, any change in a TFs concentration or its spatial or temporal distribution may result in unexpected cross talk within the gene regulatory network; a TF may inadvertently affect the expression of gene targets of its other family members. This cross talk phenomenon within TF families appears universal within the eukaryotes and has been described in yeast (Gordân et al., 2013), plants (Nuruzzaman et al., 2013), and mammalian cancer cell lines (Altman et al., 2015). Therefore, understanding the mechanisms that govern how TFs within large TF families regulate their target genes is an important challenge. Understanding gene regulatory networks in plants is further complicated by the fact that plants have more and larger TF families than animals or fungi (Shiu et al., 2005) and even larger families than would be expected through whole-genome duplication alone (Shiu et al., 2005; Rensing, 2014). For instance, the highly conserved G-box motif (CACGTG) is bound by TFs in the basic helix-loop-helix (bHLH) and basic Leu zipper (bZIP) families in organisms ranging from yeasts to humans. However, in plants, these two TF families have expanded massively; for example, the bHLH family is now the second largest TF family in plants, with over 100 mem- bers in Arabidopsis (Arabidopsis thaliana), despite having arisen from an estimated 14 founder genes in ancient land plants (Carretero-Paulet et al., 2010). At least 80 of these bHLHs have the precise amino acid composition in their DNA-binding domain required to bind to G-box ele- ments (Heim et al., 2003; Carretero-Paulet et al., 2010). Many of the other bHLHs may bind to E-box elements, which retain four nucleotides of the G-box core (ACGT or CANNTG). The bZIP family has similarly expanded from four founder genes to over 70 (Corrêa et al., 2008). Moreover, both bHLHs and bZIPs bind to DNA as either homodimers or heterodimers, further in- creasing the possible regulatory combinations. Even non-G-box-binding bHLHs and other HLHs can in- directly regulate G-box-regulated genes by competing with G-box-binding bHLHs for dimerization partners (Hao et al., 2012; Oh et al., 2014). Furthermore, bZIPs 1 This work was supported by the Biotechnology and Biology Re- search Council (RG80054 to P.A.W.). P.A.W.s laboratory is sup- ported by a Fellowship from the Gatsby Foundation (GAT3273/ GLB). Funding for open access charge: Gatsby Foundation/ GAT3273/GLB. 2 Current address: Harvard Medical School, Division of Medical Sciences, Boston, MA 02115. 3 Address correspondence to [email protected]. The author responsible for distribution of materials integral to the ndings presented in this article in accordance with the policy de- scribed in the Instructions for Authors (www.plantphysiol.org) is: Philip A. Wigge ([email protected]). D.E. conceived of the idea of the project, did most of the analysis, wrote the software, and wrote most of the article; S.J.K.S. assisted with the analysis for Figure 2; A.B. designed and conducted the MNase-seq experiment; A.B., P.D., S.C., V.C., and M.S.B. designed and conducted RNA-seq experiments; S.C., V.C., and S.B. helped map the RNA-seq data; K.E.J. conducted the ChIP-seq experiments; P.A.W. supervised and contributed to writing the article. [OPEN] Articles can be viewed without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.17.01086 628 Plant Physiology Ò , October 2017, Vol. 175, pp. 628640, www.plantphysiol.org Ó 2017 American Society of Plant Biologists. All Rights Reserved. www.plantphysiol.org on June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

Breakthrough Technologies

The G-Box Transcriptional RegulatoryCode in Arabidopsis1[OPEN]

Daphne Ezer,a Samuel J.K. Shepherd,a Anna Brestovitsky,a Patrick Dickinson,a Sandra Cortijo,a

Varodom Charoensawan,a,b Mathew S. Box,a Surojit Biswas,a,2 Katja E. Jaeger,a and Philip A. Wiggea,c,3

aSainsbury Laboratory, University of Cambridge, Cambridge CB2 1LR, United KingdombDepartment of Biochemistry, Faculty of Science, and Integrative Computational BioScience Center, MahidolUniversity, Bangkok 10400, ThailandcDepartment of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom

ORCID IDs: 0000-0002-1685-6909 (D.E.); 0000-0003-3291-6729 (S.C.); 0000-0002-2199-4126 (V.C.); 0000-0001-7995-5384 (M.S.B.);0000-0002-4153-7328 (K.E.J.); 0000-0003-4822-361X (P.A.W.).

Plants have significantly more transcription factor (TF) families than animals and fungi, and plant TF families tend to containmore genes; these expansions are linked to adaptation to environmental stressors. Many TF family members bind to similar oridentical sequence motifs, such as G-boxes (CACGTG), so it is difficult to predict regulatory relationships. We determined thatthe flanking sequences near G-boxes help determine in vitro specificity but that this is insufficient to predict the transcriptionpattern of genes near G-boxes. Therefore, we constructed a gene regulatory network that identifies the set of bZIPs and bHLHsthat are most predictive of the expression of genes downstream of perfect G-boxes. This network accurately predictstranscriptional patterns and reconstructs known regulatory subnetworks. Finally, we present Ara-BOX-cis (araboxcis.org), aWeb site that provides interactive visualizations of the G-box regulatory network, a useful resource for generating predictionsfor gene regulatory relations.

Many transcription factors (TFs) are part of largefamilies, with many members binding to highly over-lapping sets of binding sites. Therefore, any change in aTF’s concentration or its spatial or temporal distributionmay result in unexpected cross talk within the generegulatory network; a TF may inadvertently affect theexpression of gene targets of its other family members.This cross talk phenomenon within TF families appearsuniversal within the eukaryotes and has been describedin yeast (Gordân et al., 2013), plants (Nuruzzaman et al.,2013), and mammalian cancer cell lines (Altman et al.,

2015). Therefore, understanding the mechanisms thatgovern how TFs within large TF families regulate theirtarget genes is an important challenge.

Understanding gene regulatory networks in plants isfurther complicated by the fact that plants have moreand larger TF families than animals or fungi (Shiu et al.,2005) and even larger families than would be expectedthrough whole-genome duplication alone (Shiu et al.,2005; Rensing, 2014). For instance, the highly conservedG-box motif (CACGTG) is bound by TFs in the basichelix-loop-helix (bHLH) and basic Leu zipper (bZIP)families in organisms ranging from yeasts to humans.However, in plants, these two TF families have expandedmassively; for example, the bHLH family is now thesecond largest TF family in plants, with over 100 mem-bers in Arabidopsis (Arabidopsis thaliana), despite havingarisen froman estimated 14 founder genes in ancient landplants (Carretero-Paulet et al., 2010). At least 80 of thesebHLHs have the precise amino acid composition in theirDNA-binding domain required to bind to G-box ele-ments (Heim et al., 2003; Carretero-Paulet et al., 2010).Many of the other bHLHs may bind to E-box elements,which retain four nucleotides of theG-box core (ACGTorCANNTG). The bZIP family has similarly expandedfrom four founder genes to over 70 (Corrêa et al., 2008).

Moreover, both bHLHs and bZIPs bind to DNAas either homodimers or heterodimers, further in-creasing the possible regulatory combinations. Evennon-G-box-binding bHLHs and other HLHs can in-directly regulate G-box-regulated genes by competingwith G-box-binding bHLHs for dimerization partners(Hao et al., 2012; Oh et al., 2014). Furthermore, bZIPs

1 This work was supported by the Biotechnology and Biology Re-search Council (RG80054 to P.A.W.). P.A.W.’s laboratory is sup-ported by a Fellowship from the Gatsby Foundation (GAT3273/GLB). Funding for open access charge: Gatsby Foundation/GAT3273/GLB.

2 Current address: Harvard Medical School, Division of MedicalSciences, Boston, MA 02115.

3 Address correspondence to [email protected] author responsible for distribution of materials integral to the

findings presented in this article in accordance with the policy de-scribed in the Instructions for Authors (www.plantphysiol.org) is:Philip A. Wigge ([email protected]).

D.E. conceived of the idea of the project, did most of the analysis,wrote the software, and wrote most of the article; S.J.K.S. assistedwith the analysis for Figure 2; A.B. designed and conducted theMNase-seq experiment; A.B., P.D., S.C., V.C., and M.S.B. designedand conducted RNA-seq experiments; S.C., V.C., and S.B. helpedmap the RNA-seq data; K.E.J. conducted the ChIP-seq experiments;P.A.W. supervised and contributed to writing the article.

[OPEN] Articles can be viewed without a subscription.www.plantphysiol.org/cgi/doi/10.1104/pp.17.01086

628 Plant Physiology�, October 2017, Vol. 175, pp. 628–640, www.plantphysiol.org � 2017 American Society of Plant Biologists. All Rights Reserved. www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from

Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 2: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

and bHLHs can act antagonistically, competing forbinding to the same sites, such as the competitionfor binding to targets shared between the bHLHPHYTOCHROME INTERACTING FACTOR3 (PIF3)and the bZIP ELONGATED HYPOCOTYL5 (Toledo-Ortizet al., 2014).Understanding how bHLHs and bZIPs regulate their

downstream target is critical, as these TFs have beenidentified as key regulators of growth (Choi and Oh,2016), temperature response (Jung et al., 2016), immuneresponse (Alves et al., 2013), metal homeostasis (Longet al., 2010), drought response (Nakashima et al., 2014),and light signaling (Gangappa et al., 2013; Jung et al.,2016), pathways thatwould serve as promising targets forimproving crop yield and environmental resilience.Muchof our knowledge of G-box specificity comes from studiesin yeast and mammalian systems, and these results sug-gest that bZIPs and bHLHs are affected by the local shapeof the DNA flanking the binding site (Gordân et al., 2013;Zhou et al., 2015) and epigenetics (Fernandez et al., 2003;Swarnalatha et al., 2012). Since these families are muchlarger in plants than in the other organisms studied, it isunclear whether these mechanisms are sufficient to ex-plain TF-to-binding site interactions in Arabidopsis. Thishas been a hindrance to the plant research community,as there have been many cases where a G-box has beenidentified as critical for the regulation of a gene of interestbut researchers were unable to distinguish between themany possible candidate TFs that might regulate thatgene (Hwang et al., 1998; Kim and Guiltinan, 1999;Kobayashi et al., 2012; Liu et al., 2016a, 2016b).In this study, we identify a set of approximately 2,000

genes that are highly likely to be regulated by an up-stream G-box motif in Arabidopsis seedlings, and we askhow bHLHs and bZIPs might regulate these genes. Wefind that, while the sequences immediately flanking theG-boxes enable us to predict the in vitro binding of bZIPhomodimers, the flanking sequences are not sufficient toexplain the gene expression profiles of the downstreamgenes. Therefore,we constructed a network to identify thebHLHs and bZIPs whose expression was most predictiveof the expression profiles of genes downstream of perfectG-boxmotifs. This network can predict a number of well-established subnetworks, and the entire network can bebrowsed through an interactive network visualizationWeb site we have developed, called Ara-BOX-cis. Thisstudy provides useful resources for those seeking to un-derstand how their gene of interest with a nearby G-boxmay be regulated and to identify the likely responsiblebHLH or bZIP TFs. These tools may help researchersmake testable predictions of gene regulation interactions.

RESULTS

bZIP Homodimers Only Bind to a Subset of PerfectG-Boxes in Vitro

We sought to identify a high-confidence set of geneslikely to be regulated byG-box-bindingTFs inArabidopsisseedlings as our reference set. For our high-confidence list

of genes, we selected those genes containing a perfectG-box sequence (CACGTG) within 500 bp upstream oftheir transcription start site (TSS) and thatwere expressedin a substantial number of samples of 7-d-old seedlingsat 22°C (see “Selection of Genes” in “Materials andMethods”; Supplemental Fig. S1A). A total of 2,146 genesfit these criteria (Supplemental Table S1).

This high-confidence list of binding sites is very con-servative, since it is known that bHLHs and bZIPs maybind to imperfect G-boxes with one or more deviationsfrom the core CACGTG sequence (Song et al., 2016) andtheymayhave binding sites that are farther away from theTSS. First, we decided to focus on perfect G-boxes becauserelaxing the criteria to include imperfect G-boxes (up toone base deviating from the CACGTG pattern) within500 bp of the TSSwould have expanded the list of possiblebHLH and bZIP targets to 11,238 genes, about one-thirdof the genes in the Arabidopsis genome (SupplementalFig. S2), and a large proportion of these sites may not bebound in vivo. Next, the 500-bp threshold was adoptedbecause there is enrichment for G-box sequences within500 bp of the TSS (Supplemental Fig. S1B).

Interestingly, the ACGTG and CNCGTG motifs alsowere enriched in the 500 bp upstream of the TSS, whilethe CANGTG motif had a more even distribution(Supplemental Fig. S3), even though thismotif occurredat a higher frequency compared with the other imper-fect G-box-binding sites (Supplemental Fig. S2). TheACGTG and CNCGTG motifs contain the core E-boxmotif (ACGT) or BZR motif (CGTG) and, therefore, aremore likely to represent real TF-binding sites.

Next, we sought to determine whether bHLHs andbZIPs are capable of binding to all perfect G-boxes oronly a subset of these sites. To resolve this, we com-pared the in vitro binding of bZIPs, bHLHs, and BZR(a TF that is not a bHLH or bZIP but whose bindingsites are CGTG, a subset of the G-box; He et al., 2005).We identified binding sites within our set of promoterscontaining perfect G-boxes using previously publishedDAP-seq data (O’Malley et al., 2016; SupplementalTable S2 for bZIP and bHLH binding near perfectG-boxes and Supplemental Table S3 for all genes withbZIP or bHLH binding). Both methylated and unme-thylated DNA sequences were used in this analysis,and methylation had minimal effects on the bindingbehavior (Supplemental Fig. S4).

These results indicate that there is a set of bZIPs thatonly bind to a subset of perfect G-boxes (Fig. 1A, bindingpattern 1), including known strong G-box binders suchas G-BOX BINDING FACTOR3 and bZIP16. This indi-cates that TFs can distinguish between perfect G-boxesbased on the genomic context of the binding site. FewerbHLHTFs are represented in theDAP-seqdata, and theyoften have fewer identified binding events; those thathave a fraction of reads in peaks of less than 5 are indi-cated in orange in Figure 1A.

Both BIM2 (a bHLH) and BZR exhibit more ubiqui-tous binding to G-boxes than the bZIPs. However, thereis still a cluster of G-boxes in our set that had very littlebinding in this data set (labeled binding pattern 2).

Plant Physiol. Vol. 175, 2017 629

G-Box Regulatory Code

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 3: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

It may be that DAP-seq is only capable of identifyingthe very strongest binding sites; for instance, it may bethat all perfect G-boxes can be bound by bHLHs orbZIPs but that many of these perfect G-boxes are underour detection threshold. If this were the case, we mightexpect that DAP-seq would not be able to detect bHLHor bZIP binding to promoters that contain only a singleimperfect G-box. However, DAP-seq experiments doshow bZIP binding tomany E-boxes (ACGTG) but verylittle binding to the CANGTGmotif (Supplemental Fig.S5), consistent with our previous observation thatACGTG and CNCGTG (but not CANGTG) motifs areenriched in promoters (Supplemental Fig. S3).

Next, we sought to determine if these sites are likelyto be functional in vivo. In particular, we wished todetermine whether both the promoters with binding

pattern 1 and binding pattern 2 were bound in vivo,even though there were fewer TFs bound to the latterset of promoters in vitro. Protein-DNA-binding infor-mation can be inferred from micrococcal nuclease(MNase) data, since proteins that bind DNA protect itand perturb the MNase digestion pattern. In this way,plotting the fragment sizes over genomic position rela-tive to a binding site reveals a characteristic V shape ifthe binding site is occupied (for further description ofV-plots, see Supplemental Fig. S6; Henikoff et al., 2011;Zentner and Henikoff, 2012). For both G-boxes in bind-ing pattern 1 and binding pattern 2, we see a strong Vpattern suggestive of in vivo occupancy (Fig. 1B). Thisindicates that G-boxes from both binding patterns mayhave regulatory roles in vivo, but from these data it is notpossible to identify the specific TF.

Figure 1. While TFs can differentiate betweenperfect G-box motifs given the genomic con-text, multiple TFs are capable of binding to thesame G-box sequence. A, DAP-seq experi-ments from O’Malley et al. (2016) illustratewhich bZIPs, bHLHs, and BZRs are foundwithin 500 bp of the TSS of genes with perfectG-boxes in their promoters. Binding events aredrawn in red. This heat map includes bothmethylated and unmethylated samples, so mostTFs appear in two rows. Note that many bHLHTFs had very low numbers of peaks in the DAP-seq experiment (fraction of reads in peak , 5),and these are highlighted in orange. The twolargest clusters include binding pattern 1,which has a large amount of bZIP binding, andbinding pattern 2, which has almost no bindingin this DAP-seq experiment. B, MNase-seq datataken at ZT0 (immediately before dawn) illus-trates that there is TF binding in vivo in pro-moters with binding pattern 1 and bindingpattern 2, as indicated by the prominent Vshape.

630 Plant Physiol. Vol. 175, 2017

Ezer et al.

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 4: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

In summary, the DAP-seq data suggest that bZIPsare capable of distinguishing between perfect G-boxesbased on genomic context, independent of DNA acces-sibility or the presence of binding partners. However,many of the G-boxes without strong bZIP bindingin vitro are still bound by TFs in vivo, suggesting thatother factors might interact with bZIP TFs and influencetheir binding.

Sequences Flanking the G-Box Can Predict Whether bZIPHomodimers Are Capable of Binding to the Sequence

In vitro bZIP TFs are able to distinguish betweendifferent perfect G-boxes, suggesting that additionalinformation is present in the genomic context. Wetested whether it is possible to predict bZIP bindingbased on the sequencesflanking theG-box (Fig. 2A). Thishas been shown to be a key feature of G-box binding inyeast and mammals (Gordân et al., 2013; Zhou et al.,2015), and even in plants there are specific examples ofthis (e.g. ABRE prefers a GC flank; Guiltinan et al., 1990).

TheG-boxes in binding pattern 1 have a slight preferencefor GA flanks, as shown by the position weight matrix(PWM; Fig. 2B), which is consistent with earlier obser-vations of certain bZIP preferences (Williams et al.,1992); however, there are many examples of G-boxeswithout a GA or TC flank that are still bound by thesebZIPs. In fact, 48% of G-boxes in binding pattern 1 haveneither a GA prior to the G-box nor a TC after the G-box,while 16% of G-boxes in binding pattern 2 do have eitheraGAor a TCflanking sequence, so this is not sufficient toaccount for how bZIPs distinguish between G-boxes.The G-boxes that are in neither binding pattern 1 nor2 have an enrichment for a TC flank, similar to the motifseen in binding pattern 1 (Supplemental Fig. S7).

We then made a predictive model that implicitlyhandled interaction between bases, using a randomforest machine learning approach. Random forestscould predict whether bZIPs could bind to a particularG-box with an average accuracy of 88% (Fig. 2A), muchhigher than would be expected by chance (for experi-mental design, see Supplemental Fig. S8). The mostinformative features in the model were actually the

Figure 2. Flanking sequence distinguishesDAP-seq binding patterns with high ac-curacy. A, Random forest models usingG-box flanking sequence and local DNAshape can distinguish the DAP-seq bindingpatterns with approximately 88% accur-acy. B, The sequences with many bZIPsbound seem to be enriched in CA and TCflanking sequences in the PWM model. C,The sequence-based random forest modelsuggests that the two bases flanking theG-box are most critical in predicting bZIPbinding. D, The DNA shape features thatare most important for distinguishing bZIPboundG-boxes are theminor groovewidth(MGW) and the helical turn (HelT) pattern.

Plant Physiol. Vol. 175, 2017 631

G-Box Regulatory Code

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 5: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

presence of a G or a C two bases away from the G-box(Fig. 2C), even though these bases appear less infor-mative than the bases immediately flanking the G-boxin the PWM model (Fig. 2B).

In addition, we considered amodel that incorporatedshape features such as the width of the DNA sequenceor the ability to twist or roll (Chiu et al., 2016); thisperformed equally well as the DNA sequence model(Fig. 2A). We found that helix twist and the minorgroove width were the most important features (Fig.2D), as opposed to roll and propeller twist. In yeast,minor groove width was found to play a role in deter-mining the overall binding affinity of a G-box, butpropeller twist could be used to distinguish betweenthe binding of different G-box-binding TFs (Gordânet al., 2013). In mammals, the roll feature could be usedto predict the binding affinity of Max, a TF that binds tothe ACGT core sequence but prefers CACGTG, thecomplete G-box (Zhou et al., 2015). This suggests thatTFs from different species may prefer distinctive DNAshape features. Previously, it had been shown that ABI5binding (a bZIP TF) was best predicted by homotypicclusters of binding sites (O’Malley et al., 2016), a featurethat is common in active promoters of a range of species(Ezer et al., 2014a, 2014b). However, we observe only aslight increase in the frequency of homotypic clusterswithin 1,000 bp of the TSS for genes in binding pattern1 compared with binding pattern 2, even when weexpand the analysis to include imperfect G-boxes(Supplemental Fig. S9).

The Gene Expression Network Predicts bHLH and bZIPRegulatory Targets

Even though we can predict whether bZIP homo-dimers are likely to bind to a perfect G-box in the DAP-seq experiment using the flanking DNA sequences, theDAP-seq data do not fully explain how bZIPs andbHLHs regulate genes downstream of G-boxes in vivo.First, it is unclear which TFs are binding to the G-boxesin binding pattern 2 in vivo. Second, even if a TF iscapable of binding to a motif, this does not indicate thatthis TF is necessarily influencing the gene expression ofthe target gene. For instance, a TF may bind to a DNAsequence under many conditions, but the TF may onlybe active/functional under specific conditions. Theconverse also might be true: a transiently bound TFmay still influence the expression of a nearby gene viathe hit-and-run mechanism (Para et al., 2014). In ourcase, we have seen that many bZIPs are capable ofbinding to the same subset of G-boxes, yet it is unlikelythat all these binding events are biologically relevant.There may be temporal or spatial differences in theconcentration of cofactors or in the chromatin nearthese G-boxes, and these bZIPs have very different geneexpression patterns from one another (Fig. 3A), so theymay still regulate different subsets of genes.

Therefore, we constructed a gene regulatory networkto determine the best set of bHLHs and bZIPs able to

predict the expression of our set of genes downstreamof perfect G-boxes. Compared with other coexpressionnetworks, we would expect to see enrichment for directTF-gene regulatory interactions, since we have priorknowledge that these TFs have the potential to bind andregulate this set of genes. The network was constructedbased on a large set of RNA sequencing (RNA-seq)time-course data summarized in Supplemental TableS4. These samples include Arabidopsis strains withloss-of-function mutations and overexpressors of genesinvolved in the circadian clock (lux-4 and elf3-1; SRA:PRJNA384110), temperature and red light signaling(phyABCDE; SRA: PRJNA341458), histone remodeling(dek3, DEK3-OX, hos1-3, and arp6; SRA: PRJNA406930,PRJNA403781, and PRJNA406870), and sugar pro-cessing (ss4-1; SRA: PRJNA403781) as well as two wild-type strains (Columbia-0 [Col-0] and Landsberg erecta[Ler]; SRA: PRJNA341458 and PRJNA384110). In mostcases, gene expression data were collected in at leasttwo different temperatures, usually 22°C and 27°C, andsampled at eight or more time points throughout a 24-hperiod. These RNA-seq data sets were selected pri-marily because they perturbed systems that were reg-ulated by G-boxes, including temperature and lightresponses and the circadian clock. Furthermore, thetemperatures fall within the ambient range, so theplants should not be under stress. In total, there were229 RNA-seq samples utilized in the analysis. OneRNA-seq time course (for eight time points of pif4-101[SRA: PRJNA403781] at 22°C and 27°C) is not used inthis portion of the analysis but is introduced later tohelp validate the network structure. All RNA-seq time-course experiments were conducted in house under thesame conditions. Some of the RNA-seq time courseshave been published previously (Jung et al., 2016; Ezeret al., 2017), but the ss4-1, hos1-3, and pif4-101 timecourses are presented here for the first time.

Clustering of the data revealed that almost all of thegenes near G-boxes were expressed in a diurnal geneexpression pattern (for Col-0 and Ler only, see Fig. 3B;for complete clustering, see Supplemental Fig. S10; for aclustering of bHLH and bZIP TFs in Col-0 and Ler, seeSupplemental Fig. S11A). In particular, there are twolarge clusters of genes that were most highly expressedin the late night and dawn (light green and dark green)and another large cluster of genes that was expressedsharply 1 h after dawn (tan); this rapid induction ofgene expression at dawn is referred to as the morningpeak, which has been conjectured to play a role in re-setting the circadian clock in response to light (Michaelet al., 2008). There is a similar pattern of expressionamong bHLH and bZIPs as there is for genes nearperfect G-boxes, although there is a slightly higherproportion of TFs expressed at late night (SupplementalFig. S11B).

From these extensive gene expression data, a net-work was inferred using an ensemble machine learningapproach by averaging the rank of edges predictedfrom a random forest approach (Genie3; Huynh-Thuet al., 2010), a regression approach (TIGRESS; Haury

632 Plant Physiol. Vol. 175, 2017

Ezer et al.

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 6: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

et al., 2012), and a mutual information approach (CLR;Madar et al., 2010; Supplemental Fig. S12). Such a net-work inference strategy was shown to be particularlystrong in a crowd-sourced network inference challenge(Marbach et al., 2012), and we found that all threemethods consistently predicted a similar set of edges(Supplemental Fig. S13). To determine if the networkstructure would be consistent across different temper-atures, a network was constructed using only 22°C dataand another was constructed using only 27°C data,using Genie3. Although the top 10,000 edges werelargely consistent between the two networks, thestrength of these associations varied quite substantiallyacross the two different temperatures (SupplementalFig. S14; Supplemental Table S5).The resulting gene expression network (using the

entire set of RNA-seq data) naturally reconstructs thediurnal cycle when drawn with a force-directed layoutand with the positions of TF nodes (indicated by darkblue circles) restricted to a small inner circle (Fig. 3C). In

order to further probe the core structure of the network,we visualized the TF-TF interactions in the network,drawing each TF as a pie chart illustrating the propor-tion of its predicted targets that fall into each cluster(Fig. 3D). It is important to note that if two TFs wereboth predicted to target the same gene, they wouldhave been pulled together by the force-directed layoutin Figure 3C but not in Figure 3D, unless it was alsopredicted that one TF directly regulates the other TF. Insuch a depiction, the day/night cycle structure of thenetwork is present but less clearly visible. Instead, themost striking feature is that three strong clusters ofnodes emerge: one tight group of dawn-expressedgenes and two separate groups of genes that areexpressed primarily at night. In contrast, the daytimegenes appear more dispersed. Interestingly, the TFsthat link the two clusters of night genes are PIFs: PIF4,PIF3, PIF5, and PIF6. PIFs have emerged as key G-box-binding TFs, since they integrate environmental signalsto control morphogenesis in the late night (Nozue et al.,

Figure 3. Gene expression pattern of genes near G-boxes used to infer the regulatory network. A, There are four bZIPs that bind toG-boxes in the binding pattern 1 cluster of Figure 1A that also are expressed with transcripts per million. 1 in seedlings. They allhave very different gene expression profiles. B, Overall, most genes near G-boxes are expressed diurnally, and seven clusterswereidentified; the labeled colors of these clusters will be used consistently throughout the remainder of the article. C, This networkdepicts predicted regulatory links between bHLHs/bZIPs, and genes near G-boxes are shown. Note that the network naturallyreconstructs the diurnal cycle: dawn genes, night genes, and day genes all cluster with their own kind. The colors are the same asin B, except that TFs are shown in dark blue. D, To condense the network, each TF was depicted as a pie chart indicating the time-of-day clusters of the genes it is predicted to regulate. The dawn cluster is particularly tight, and two distinct night clusters also arepresent.

Plant Physiol. Vol. 175, 2017 633

G-Box Regulatory Code

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 7: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

2011; Zhang et al., 2013; Leivar and Monte, 2014;Pfeiffer et al., 2014). These results suggest that PIFs mayhave a key role in the structure of the plant regulatorynetwork: linking the early night and late night/dawntranscriptional programs.

If there is a group of TFs that all have very similarexpression patterns, then there is a high degree of re-dundant information in that portion of the network. Forexample, since there is a substantial set of TFs that arepart of the dawn cluster, these TFs convey redundantinformation and there would be many possible choicesof TFs that could be used to accurately predict the ex-pression of genes expressed in the dawn peak. In orderto quantify how much redundancy there is in the net-work, we calculate how easy it is to predict the gene

expression pattern of each gene near a G-box, using thegene expression of a random subset of TFs as input(Supplemental Fig. S15A). As expected, the most pre-dictable gene expression profiles come from genes thatare highly connected in the network (Supplemental Fig.S15B) or that are part of dense subnetworks represent-ing nighttime or dawn transcriptional programs(Supplemental Fig. S15C).

The Network Can Be Used to Predict Gene ExpressionPatterns in a PIF4 Knockout

We next sought to determine whether our modelcould correctly predict gene expression patterns after a

Figure 4. The Ara-BOX-cis regulatory network is predictive. A, A time course of pif4-101 RNA-seq experiments at 22°C and 27°Cwas not included in training the gene regulatory network. For each gene, a random forest regression model was trained to predictits gene expression based on the gene expression of the predicted regulatory links (or a random set of the same number of bZIPsand bHLHs as a control). Both of thesemodelswere trained on the original RNA-seq data used to construct the regulatory network(i.e. not including pif4-101), but they were tested on the pif4-101 RNA-seq data set, and the Pearson’s correlation coefficient wascalculated. Points above the line indicate that the predicted network performed better than a random network at predicting newgene expression data. B, The Ara-BOX-cis network includes a subnetwork of phytohormone response genes with known bio-logical interaction (see main text). Most notably, IBH1 is a bHLH that does not bind G-boxes but that can heterodimerize withPRE1 and some BEE proteins to prevent them from binding to their targets, such as the G-box in the promoters of YUCCA8 (YUC8)and IAA19. C, In addition, Ara-BOX-cis correctly identifies a known metal homeostasis subnetwork, centered on the PYE genethat is responsible for iron deficiency. Note that it interacts directly with BTS, and many of its predicted targets change their geneexpression in pye. It is important to note that no phytohormones or metal ions were influenced directly in the RNA-seq exper-iments; however, both of these processes are known to be temperature and circadian clock dependent.

634 Plant Physiol. Vol. 175, 2017

Ezer et al.

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 8: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

gene perturbation (i.e. whether the network is capableof predicting the gene expression arising from data thatwere not utilized when constructing the network). Totest this, we evaluated the ability of the network topredict the gene expression pattern in a pif4-101 timecourse at 22°C and 27°C, which was not used in theconstruction of the original network. For each gene neara G-box, we trained a random forest model that uses asinput the gene expression pattern of TFs that it is linkedto in the network, again excluding the pif4-101 data inthe training. Finally, we used this random forest modelto predict the gene expression in the pif4-101 set andcalculate its Pearson’s correlation coefficient. As acontrol, we trained another model that uses the samenumber of TFs but ones that are randomly sampled in-stead of those predicted by the network (SupplementalFig. S16). If the network performed no better than arandom network, we would expect these two models topredict the gene expression pattern of the pif4-101 genesequally well. However, as shown in Figure 4A, the pre-dicted network performs substantially better than ran-dom. Note that there is a large set of genes that do nothave a substantial change in gene expression in thepif4-101 set; for these genes, it is easy to accurately predictthe gene expression pattern in pif4-101 using a randomnetwork. Critically, the larger the perturbation caused bythe pif4-101mutant, the greater the improvement we seefrom the predicted network compared with the randomnetwork (Supplemental Fig. S17). If a gene has thesame expression pattern in Col-0 and pif4-101, theneven an over-fit random model would be likely topredict the expression of that gene accurately. How-ever, if there was a large perturbation in gene ex-pression, a network that captures something morefundamental about the coexpression of genes shouldperform better than random. Overall, this analysisindicates that the Ara-BOX-cis network can be used topredict gene expression, suggesting that its links/edges are biologically relevant.It has been shown previously that PIF1, PIF3, PIF4,

and PIF5 often bind to the same set of promoters(Zhang et al., 2013; Pfeiffer et al., 2014). To verify thatAra-BOX-cis-predicted downstream targets of thesePIFs were bound by PIFs in vivo, we searched forbinding within the intergenic sequences upstream ofthe TSS of these genes, using previously publishedChIP-seq data of PIF1, PIF3, PIF4, and PIF5 binding in2-d-old, dark-grown seedlings (Zhang et al., 2013;Pfeiffer et al., 2014) and newly collected PIF4 ChIP-seqexperiments from 11-d-old, short-day-grown seedlingsat ZT4. We found that 76.5% of the 34 predicteddownstream targets of PIFs were bound by at least onePIF (Supplemental Fig. S18; Supplemental Table S6).Interestingly, some of the predicted PIF targets that wereconfirmed byChIP-seq come from binding pattern 1 andsome come from binding pattern 2 (Supplemental Fig.S18; Supplemental Table S6). Additionally, 52.9% ofthese 34 geneswere previously reported high-confidencegene targets of PIF4 (Oh et al., 2012). It is important tonote that none of these ChIP-seq conditions match those

used in our network analysis and that TF binding can behighly condition dependent (Jung et al., 2016).

The Network Identifies Metal Homeostasis andPhytohormone Subnetworks

Another indication of the strength of this networkinference approach is our ability to identify knownsubnetworks involved in systems that were not per-turbed directly in any of the RNA-seq experiments.

One such subnetwork is involved in phytohormoneresponse, which is particularly interesting since it lies atthe threshold between the night and dawn clusters ofthe gene regulation network (Fig. 4B). This is consistentwith the hypothesis that hormone signaling helps reg-ulate the dawn gene expression burst (Michael et al.,2008). Recently, it has been confirmed that LONGHYPOCOTYL IN FAR-RED1 (HFR1) directly regulatesYUC8 (Hayes et al., 2017). Moreover, BANQUO1(BNQ1) is a non-DNA-binding bHLH that can antago-nistically bind ILI1-BINDING BHLH1 (IBH1), a regula-tory mechanism conserved in rice (Oryza sativa; Zhanget al., 2009). Furthermore, recentwork has identified thatIAA19, BEE1/2/3, HBI1, IBH1, and HFR1 are all part ofthe same gene expression network that links hormone,light response, and cell elongation (Oh et al., 2014).

Another interesting subnetwork that was inferredby Ara-BOX-cis was the metal homeostasis network(Fig. 4C), centered on the root iron homeostasis genePOPEYE (PYE), which appears to be regulated by themaster iron homeostasis regulator bHLH34 (Li et al.,2016a). The predicted downstream targets of PYE includeBTS, a tightly coregulated protein that also is involvedin iron homeostasis, and two direct targets of PYE,OBP3-RESPONSIVE GENE1 and ZINC TRANSPORTER1PRECURSOR, as determined by a combination of micro-array expression analysis and ChIP-on-chip (Long et al.,2010).

Nevertheless, as this is a coexpression network, thelinks describe the correlations between G-box-bindingTF expression and the expression of genes downstreamof G-boxes and do not prove causation.

Although both phytohormone response and metalhomeostasis are time of day-dependent responses,neither system was specifically perturbed in any of theRNA-seq experiments used to infer the network. Thisindicates that the network has the potential to providebiologically relevant information to researchers whohave an interest in particular G-box binders or genesnear G-boxes.

Ara-BOX-cis as a Resource for Plant Biologists

To maximize the usefulness of Ara-BOX-cis to thecommunity, we have made it available through a user-friendly Web browser (www.araboxcis.org). The Ara-BOX-cis network has three interactive layouts availablefor helping researchers generate testable hypotheses forthe genes regulating a specific G-box or a set of genes

Plant Physiol. Vol. 175, 2017 635

G-Box Regulatory Code

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 9: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

regulated by G-boxes (Supplemental Fig. S12). First,users can observe the entire network at once. In thisview, they can drag around edges in order to more in-tuitively understand the makeup of the network as wellas identify motifs that are enriched in the predicteddownstream targets of certain well-connected TFs andhighlight genes based on the DNA sequences flankingthe G-box. In the second layout, users can search for aparticular gene via its TAIR identifier and see the geneexpression network centered on that gene; the gene’sdescription and the gene expression cluster of the par-ents and/or children of the node in the network areshown as in Figure 4C. Finally, a set of genes can beinput by their TAIR identifiers, and the Web site willtally the number of times TFs or genes with G-boxes intheir promoters appear one step removed from any ofthese genes within the network.

We have also made available an expanded networkthat includes all seedling-expressed genes that haveperfect or imperfect G-boxes (up to one base variedfrom the core G-box motif) within 1,000 bp of the TSS.However, due to the size of this network, it was con-structed using Genie3 and may not be as reliable as theperfect G-box network.

DISCUSSION

Understanding plant gene networks is complicatedby the presence of large TF families (Shiu et al., 2005),which is especially true in the case of bHLHs and bZIPs.There have been many instances where researchershave identified that a G-box was critical for the regu-lation of their gene of interest but have been unable topinpoint the exact TF or group of TFs that bind andregulate the target gene (Hwang et al., 1998; Kim andGuiltinan, 1999; Kobayashi et al., 2012; Liu et al., 2016a,2016b). In this article, we present a set of resources tohelp researchers generate testable hypotheses in such acircumstance. Through the analysis of TF-promoterinteraction from DAP-seq data (O’Malley et al., 2016)and by observation of sequences flanking the G-box, itis possible to determine the in vitro binding specificityof bZIP homodimers. Then, by observing the position ofa gene within the Ara-BOX-cis network, it may bepossible to predict which TFs are likely to regulate thegene of interest.

The network provided by Ara-BOX-cis was con-structed using time-course RNA-seq samples that arerelevant to biological processes related to G-box func-tions; it is considered the best practice for plant generegulatory network reconstruction to use gene expres-sion samples that are most likely to make large pertur-bations in the subnetwork under study (Gaudinier andBrady, 2016). It also uses network inference approachesthat have been found to be most effective in side-by-sidecomparisons of network inference algorithms (Marbachet al., 2012) and that have been used previously to suc-cessfully reconstruct the root gene expression network inArabidopsis and legumes (Wang et al., 2013; Taylor-

Teeples et al., 2015). However, it is important to notethat a core assumption of these algorithms is that eachsample is independent, so our network cannot predictany posttranscriptional regulation or translation rate-based time-delay effects. There are alternative algo-rithms that attempt to take into account the delay in TFexpression and their action (Li et al., 2006; Huang et al.,2010; Zoppoli et al., 2010; Chen et al., 2014; Yalamanchiliet al., 2014; Li et al., 2016b), but these are not suitable forour data, since our data are not sampled at high enoughfrequency and have uneven sample times (i.e. ZT1). Inaddition, some of these time-delay network inferencealgorithms cannot smoothly integrate perturbation andtime-course experiments.

Although the Ara-BOX-cis network has fewer genesthan other network approaches such as AraNET (Leeet al., 2015) and GeneMANIA (Mostafavi et al., 2008),Ara-BOX-cis has the benefit of identifying possible linksbetween TFs for which there is prior information sug-gesting a regulatory link, which is similar to the weak-prior approach suggested by Krouk et al. (2013). All theTFs in this network either bind to G-boxes or hetero-dimerizewith TFs that can do so, and all the other genesin the network have a perfect G-box within 500 bp oftheir TSS. By limiting the predictions to this set, weenrich for direct regulatory associations.

We note that Ara-BOX-cis’s predicted edges may notnecessarily represent direct regulatory interactions. Forinstance, there may be a bHLH and a gene near a G-boxthat are both regulated by the same non-G-box-bindingTF. In such circumstances, it is likely that Ara-BOX-ciswill predict an edge between this bHLH and the genenear the G-box even if there is no direct interaction. Thismay be the case with the network centered on PIF5: theevening complex was shown recently to regulate PIF5and many of PIF5’s predicted targets (Ezer et al., 2017;Supplemental Fig. S19). Nevertheless, the ability of Ara-BOX-cis to successfully predict the metal homeostasisand phytohormone subnetworks demonstrates thatAra-BOX-cis makes biologically relevant predictions.

There are many other conserved cis-regulatory boxesin Arabidopsis, such as C-boxes, W-boxes, and heatshock elements, and a similar approach of analyzingDAP-seq data and generating regulatory networkscould be applied to these. Additionally, while Ara-BOX-cis has been trained on seedling transcriptomes,the same approach could be used to analyze tran-scriptional regulatory networks in other tissues, such asduring floral development.

Understanding how plants regulate their genes is ofprofound importance. For instance, in the Green Rev-olution of agriculture that saved millions of lives fromfamine, one of the key achievements was the develop-ment of semidwarf wheat varieties. Later, it was de-termined that this phenotype arose from a gene variantfor the TFGA REQUIRING1 (Peng et al., 1999). This hassparked interest in the development of new biotech-nologies that target TFs to further improve yields andalso offer resilience to environmental fluctuations(Century et al., 2008; Grotewold, 2008). Understanding

636 Plant Physiol. Vol. 175, 2017

Ezer et al.

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 10: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

how TF duplications affect plant gene regulation iscritical if we hope to reverse engineer plant regulatorynetworks for agricultural applications.

MATERIALS AND METHODS

RNA-Seq Experiments and Data Processing

The RNA-seq experiments on the ss4-1, hos1-3, and pif4-101 lines are firstpresented here. These lines were all described previously: ss4-1 by Roldán et al.(2007), hos1-3 by Jung et al. (2013), and pif4-101 by Koini et al. (2009). They areT-DNA insertion mutants in the Col-0 background. Seeds were sown on one-half-strength Murashige and Skoog (MS) agar plates at pH 5.7 without Suc,stratified at 4°C in the dark for 3 d, and germinated for 24 h at 22°C. Plants werethen put into Conviron PGC20 reach-in growth cabinets under 170 mmol m22 s21

white light at either 22°C or 27°C short days. Seven-day-old seedlings were thensampled at various points across a 24-h time course over the diurnal cycle: ZT = 0,1, 4, 8, 12, 16, 20, and 22 h. RNAwas extracted using the MagMax-96 Total RNAExtraction Kit (Ambion; AM1830). RNA quality and integrity were assessed onthe Agilent 2200 TapeStation. Library preparation was performed using 1 mg ofhigh-integrity total RNA (RIN. 8) using the TruSeqRNALibrary PreparationKitversion 2 (Illumina; RS-122-2101 and RS-122-2001), following the manufacturer’sinstructions. The libraries were sequenced on a HiSeq2000 using paired-end se-quencing of 100 bp in length at the Beijing Genomics Institute sequencing center.

The same pipeline was used to map these sequences as described previously(Jung et al., 2016). To analyze the sequence reads, first, adapters were trimmedwith Trimmomatic-0.32 (Bolger et al., 2014). Then, Tophat (Trapnell et al., 2009)was used to map to the TAIR10 annotated genome, duplicates were removed,and the read counts were normalized by genome-wide coverage. Raw countswere determined by HTseq-count (Anders et al., 2015), and cufflinks was usedto calculate fragments per kilobase million, which was then converted intotranscripts per million. Note that this protocol for RNA-seq and data processingis the same as the one described by Jung et al. (2016) and Ezer et al. (2017).

MNase-Seq Experiments and Data Processing

Col-0 wild-type Arabidopsis (Arabidopsis thaliana) seedlings were grown onone-half-strengthMS plates in short-day conditions (8 h of light/16 h of dark) at17°C. Ten-day-old seedlings were harvested 8 h after dawn. Plant material wasimmediately cross-linked using 1% formaldehyde under vacuum. Nuclei werepurified as described previously (Folta and Kaufman, 2006). Chromatin,extracted from 1 g of materials, was resuspended in MNase digestion buffer(20 mM Tris-HCl [pH 8], 50 mM NaCl, 1 mM DTT, 0.5% Nonidet P-40, 1 mM

CaCl2, 0.5 mM phenylmethylsulfonyl fluoride, and 13 protease inhibitorcocktail [Roche]) and digested with 0.4 units mL21 MNase (Sigma; N3755) for15 min. MNase digestion was performed to obtain a mononucleosome reso-lution, as MNase preferentially digests nucleosome-free DNA and the linkerregions, whereas sequences bound by nucleosomes are relatively protected fromthe digestion (Petesch and Lis, 2008). The efficiency of MNase was assessed byseparating samples on the agarose gel prior to in-house library preparation usingthe TruSeq ChIP Sample Preparation Kit (Illumina; IP-202-1012). The librarieswere sequenced using paired-end 75 bp on NextSeq500 on site.

Selection of Genes

The list of bZIPs and bHLHs came from TAIR10 annotations (https://www.arabidopsis.org/browse/genefamily/bZIP.jsp and https://www.arabidopsis.org/browse/genefamily/bHLH.jsp). For the identification of genes that arelikely regulated by G-boxes in seedlings, genes with transcripts per million. 1 inat least half of our primary set of RNA-seq time-courses, Col-0, Ler, elf3-1, lux-4,and phyABCDE, were initially selected. The 500-bp DNA sequences upstream oftheir TSS were extracted from TAIR10, and the sequence CACGTGwas searchedfor using an R script available on github.

DAP-Seq Data

TheDAP-seq datawere extracted fromneomorph.salk.edu/PlantCistromeDB(O’Malley et al., 2016), and additional bHLH data sets (bHLH18, bHLH34,bHLH69, bHLH74, bHLH77, bHLH104, BIM3, and PIF7) were taken directlyfrom the Gene Expression Omnibus (GSE60141). In particular, the bed files listing

significant TF-binding peaks (Supplemental Table S2) were searched using bed-tools intersect, and the heat map was drawn in R.

Random Forest and DNA Shape

In all cases, R’s randomforest package was used with the number of trees setto 1,000. These scripts are available on github, and the experimental design foreach instance that random forest was used is shown in the supplemental fig-ures. Note that the DNA shape was calculated using the DNAShapeR package(Chiu et al., 2016).

Ara-BOX-cis

Ara-BOX-cis utilized Genie3 (Huynh-Thu et al., 2010), TIGRESS (Hauryet al., 2012), and CLR (Madar et al., 2010). The ranks of each edge were aver-aged, a procedure suggested by Marbach et al. (2012). This was done in R, butnote that all ranks that were over 20,000 were set to have a value of 20,000. Thethreshold for average rank used for drawing the image in Figure 3C is 5,000,and the threshold for average rank in Figure 3D is 10,000. The network wasvisualized and the Web site was constructed using d3.js (https://d3js.org/); inparticular, the two-way tree examplewas used as a template for one panel of theWeb site (http://bl.ocks.org/kanesee/5d6c48bffd4ea31201fb).

ChIP-Seq Experiment

gPIF4::PIF4-FLAG lines were grown for 11 d in short-day conditions (8 h oflight/16 h of dark) on one-half-strengthMS agar plates without Suc at 22°C andthen shifted to 27°C when the lights were turned on. Samples were collected atZT4 (4 h after light). As we have described previously (Jung et al., 2016; Ezeret al., 2017), 3 g of seedlings for each treatment was fixed under vacuum for20 min in 13 PBS (10 mM PO4

3–, 137 mM NaCl, and 2.7 mM KCl) containing1% formaldehyde (Sigma; F8775), and the reaction was quenched by addingGly to a final concentration of 62 mM. Chromatin immunoprecipitation wasperformed as described (Jaeger et al., 2013). Sequencing libraries were preparedusing the TruSeq ChIP Sample Preparation Kit (Illumina; IP-202-1024) or usingthe NEBNext Ultra II DNA Library Prep Kit, and samples were sequenced onIllumina HiSeq. Reads were mapped to the TAIR10 genome using a previouslyreported pipeline (Ezer et al., 2017) using bowtie2 (Langmead and Salzberg,2012), with duplicate reads removed and indexing using samtools (Li et al.,2009). ChIP-seq results were visualized with the Integrated Genome Viewer(IGV, 2013).

Code Availability

The Ara-BOX-cis network is available on araboxcis.org. All other codes usedto analyze these data are available on github (https://github.com/ezer/GboxRscripts). Data structures/tables to assist in running the code efficientlyare available at https://github.com/ezer/Gboxdata. The contents of the Datafolder from the Gboxdata repository can be added to the home folder of theGboxRscripts repository before running the code.

Accession Numbers

All raw RNA-seq data and MNase-seq data will be made publicly availablein the Sequence Read Archive (PRJNA403781, PRJNA406870, PRJNA406930,PRJNA341458, and PRJNA384110).

Supplemental Data

The following supplemental materials are available.

Supplemental Figure S1. Identification of genes likely regulated byG-boxes in seedlings.

Supplemental Figure S2. Counts of genes with G-box-like motifs within500 bp of their TSS.

Supplemental Figure S3. Distances of G-box-like motifs from the TSS in bp

Supplemental Figure S4. Effect of methylation on binding near G-boxes.

Supplemental Figure S5. DAP-seq TF binding near promoters containingexactly one imperfect G-box.

Plant Physiol. Vol. 175, 2017 637

G-Box Regulatory Code

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 11: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

Supplemental Figure S6. V-plots can indicate occupancy over genomicfigures.

Supplemental Figure S7. PWM of G-boxes for genes that have a singleG-box in the 500 bp upstream of their TSS but that are not in bindingpattern 1 or 2.

Supplemental Figure S8. Experimental design for predicting whetherDNA sequences that flank G-boxes are capable of distinguishing be-tween the two main binding patterns observed in DAP-seq data.

Supplemental Figure S9. Distribution of homotypic clusters of G-boxes inthe 1,000 bp upstream of the TSS, by DAP-seq binding pattern.

Supplemental Figure S10. Gene expression pattern of genes near G-boxesacross all data sets used in the Ara-BOX-cis network.

Supplemental Figure S11. Gene expression pattern of bHLH and bZIP TFs.

Supplemental Figure S12. An overview of Ara-BOX-cis.

Supplemental Figure S13. Comparison of edge ranks between methods.

Supplemental Figure S14. Comparison of the Genie3 network using only22°C RNA-seq data and only 27°C RNA-seq data.

Supplemental Figure S15. Denser clusters have more information re-dundancy.

Supplemental Figure S16. Strategy for testing whether Ara-BOX-cis can beused to predict gene expression.

Supplemental Figure S17. A random network is best at predicting theexpression of genes that are not perturbed in pif4-101.

Supplemental Figure S18. IGV screenshots of some examples of predictedPIF targets in Ara-BOX-cis that are supported by ChIP-seq data.

Supplemental Figure S19. Predicted gene network near PIF5 (PIL6).

Supplemental Table S1. List of genes that are expressed in Arabidopsisseedlings and that have perfect G-boxes within 500 bp of their TSS.

Supplemental Table S2. Subset of DAP-seq data from O’Malley et al.(2016) that is used in Figure 1.

Supplemental Table S3. All genes that have any bHLH or bZIP binding inthe O’Malley et al. (2016) DAP-seq data set.

Supplemental Table S4. Transcripts per million of all the RNA-seq datasets used to construct and test Ara-BOX-cis as well as a summary of theexperiments conducted.

Supplemental Table S5. All the edges for the networks generated usingonly 22°C data and only 27°C data.

Supplemental Table S6. Summary of genes that are predicted to be down-stream of PIFs in Ara-BOX-cis and any evidence of PIF binding up-stream of these genes in vivo.

ACKNOWLEDGMENTS

We thank members of the Wigge laboratory for feedback and discussions.

Received August 3, 2017; accepted August 30, 2017; published September 1,2017.

LITERATURE CITED

Altman BJ, Hsieh AL, Sengupta A, Krishnanaiah SY, Stine ZE, WaltonZE, Gouw AM, Venkataraman A, Li B, Goraksha-Hicks P, et al (2015)MYC disrupts the circadian clock and metabolism in cancer cells. CellMetab 22: 1009–1019

Alves MS, Dadalto SP, Gonçalves AB, De Souza GB, Barros VA, FiettoLG (2013) Plant bZIP transcription factors responsive to pathogens: areview. Int J Mol Sci 14: 7815–7828

Anders S, Pyl PT, Huber W (2015) HTSeq: a Python framework to workwith high-throughput sequencing data. Bioinformatics 31: 166–169

Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer forIllumina sequence data. Bioinformatics 30: 2114–2120

Carretero-Paulet L, Galstyan A, Roig-Villanova I, Martínez-García JF,Bilbao-Castro JR, Robertson DL (2010) Genome-wide classificationand evolutionary analysis of the bHLH family of transcription factorsin Arabidopsis, poplar, rice, moss, and algae. Plant Physiol 153: 1398–1412

Century K, Reuber TL, Ratcliffe OJ (2008) Regulating the regulators: thefuture prospects for transcription-factor-based agricultural biotechnol-ogy products. Plant Physiol 147: 20–29

Chen H, Mundra PA, Zhao LN, Lin F, Zheng J (2014) Highly sensitiveinference of time-delayed gene regulation by network deconvolution.BMC Syst Biol (Suppl 4) 8: S6

Chiu TP, Comoglio F, Zhou T, Yang L, Paro R, Rohs R (2016) DNAshapeR:an R/Bioconductor package for DNA shape prediction and feature en-coding. Bioinformatics 32: 1211–1213

Choi H, Oh E (2016) PIF4 integrates multiple environmental and hormonalsignals for plant growth regulation in Arabidopsis. Mol Cells 39: 587–593

Corrêa LGG, Riaño-Pachón DM, Schrago CG, dos Santos RV, Mueller-Roeber B, Vincentz M (2008) The role of bZIP transcription factors ingreen plant evolution: adaptive features emerging from four foundergenes. PLoS ONE 3: e2944

Ezer D, Jung JH, Lan H, Biswas S, Gregoire L, Box MS, Charoensawan V,Cortijo S, Lai X, Stockle D, et al (2017) The evening complex coordi-nates environmental and endogenous signals in Arabidopsis. Nat Plants3: 17087

Ezer D, Zabet NR, Adryan B (2014a) Homotypic clusters of transcriptionfactor binding sites: a model system for understanding the physicalmechanics of gene expression. Comput Struct Biotechnol J 10: 63–69

Ezer D, Zabet NR, Adryan B (2014b) Physical constraints determine thelogic of bacterial promoter architectures. Nucleic Acids Res 42: 4196–4207

Fernandez PC, Frank SR, Wang L, Schroeder M, Liu S, Greene J, CocitoA, Amati B (2003) Genomic targets of the human c-Myc protein. GenesDev 17: 1115–1129

Folta KM, Kaufman LS (2006) Isolation of Arabidopsis nuclei and mea-surement of gene transcription rates using nuclear run-on assays. NatProtoc 1: 3094–3100

Gangappa SN, Maurya JP, Yadav V, Chattopadhyay S (2013) The regu-lation of the Z- and G-box containing promoters by light signalingcomponents, SPA1 and MYC2, in Arabidopsis. PLoS ONE 8: e62194

Gaudinier A, Brady SM (2016) Mapping transcriptional networks inplants: data-driven discovery of novel biological mechanisms. Annu RevPlant Biol 67: 575–594

Gordân R, Shen N, Dror I, Zhou T, Horton J, Rohs R, Bulyk ML (2013)Genomic regions flanking E-box binding sites influence DNA bindingspecificity of bHLH transcription factors through DNA shape. Cell Rep3: 1093–1104

Grotewold E (2008) Transcription factors for predictive plant metabolicengineering: are we there yet? Curr Opin Biotechnol 19: 138–144

Guiltinan MJ, Marcotte WR Jr, Quatrano RS (1990) A plant leucine zipperprotein that recognizes an abscisic acid response element. Science 250:267–271

Hao Y, Oh E, Choi G, Liang Z, Wang ZY (2012) Interactions between HLHand bHLH factors modulate light-regulated plant development. MolPlant 5: 688–697

Haury AC, Mordelet F, Vera-Licona P, Vert JP (2012) TIGRESS: TrustfulInference of Gene REgulation using Stability Selection. BMC Syst Biol 6:145

Hayes S, Sharma A, Fraser DP, Trevisan M, Cragg-Barber CK, TavridouE, Fankhauser C, Jenkins GI, Franklin KA (2017) UV-B perceived bythe UVR8 photoreceptor inhibits plant thermomorphogenesis. Curr Biol27: 120–127

He JX, Gendron JM, Sun Y, Gampala SSL, Gendron N, Sun CQ, WangZY (2005) BZR1 is a transcriptional repressor with dual roles inbrassinosteroid homeostasis and growth responses. Science 307:1634–1638

Heim MA, Jakoby M, Werber M, Martin C, Weisshaar B, Bailey PC (2003)The basic helix-loop-helix transcription factor family in plants: agenome-wide study of protein structure and functional diversity. MolBiol Evol 20: 735–747

Henikoff JG, Belsky JA, Krassovsky K, MacAlpine DM, Henikoff S(2011) Epigenome characterization at single base-pair resolution. ProcNatl Acad Sci USA 108: 18318–18323

638 Plant Physiol. Vol. 175, 2017

Ezer et al.

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 12: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

Huang T, Liu L, Qian Z, Tu K, Li Y, Xie L (2010) Using GeneReg to con-struct time delay gene regulatory networks. BMC Res Notes 3: 142

Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring reg-ulatory networks from expression data using tree-based methods. PLoSONE 5: e12776

Hwang YS, Karrer EE, Thomas BR, Chen L, Rodriguez RL (1998) Threecis-elements required for rice alpha-amylase Amy3D expression duringsugar starvation. Plant Mol Biol 36: 331–341

IGV (2013) Integrative Genomics Viewer. Broad Institute, doi/10.1038/nbt0111-24

Jaeger KE, Pullen N, Lamzin S, Morris RJ, Wigge PA (2013) Interlockingfeedback loops govern the dynamic behavior of the floral transition inArabidopsis. Plant Cell 25: 820–833

Jung JH, Domijan M, Klose C, Biswas S, Ezer D, Gao M, Khattak AK, BoxMS, Charoensawan V, Cortijo S, et al (2016) Phytochromes function asthermosensors in Arabidopsis. Science 354: 886–889

Jung JH, Park JH, Lee S, To TK, Kim JM, Seki M, Park CM (2013) Thecold signaling attenuator HIGH EXPRESSION OF OSMOTICALLYRESPONSIVE GENE1 activates FLOWERING LOCUS C transcription viachromatin remodeling under short-term cold stress in Arabidopsis. PlantCell 25: 4378–4390

Kim KN, Guiltinan MJ (1999) Identification of cis-acting elements impor-tant for expression of the starch-branching enzyme I gene in maize en-dosperm. Plant Physiol 121: 225–236

Kobayashi K, Obayashi T, Masuda T (2012) Role of the G-box element inregulation of chlorophyll biosynthesis in Arabidopsis roots. Plant SignalBehav 7: 922–926

Koini MA, Alvey L, Allen T, Tilley CA, Harberd NP, Whitelam GC,Franklin KA (2009) High temperature-mediated adaptations in plantarchitecture require the bHLH transcription factor PIF4. Curr Biol 19:408–413

Krouk G, Lingeman J, Colon AM, Coruzzi G, Shasha D (2013) Generegulatory networks in plants: learning causality from time and per-turbation. Genome Biol 14: 123

Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie2. Nat Methods 9: 357–359

Lee T, Yang S, Kim E, Ko Y, Hwang S, Shin J, Shim JE, Shim H, Kim H,Kim C, et al (2015) AraNet v2: an improved database of co-functionalgene networks for the study of Arabidopsis thaliana and 27 other non-model plant species. Nucleic Acids Res 43: D996–D1002

Leivar P, Monte E (2014) PIFs: systems integrators in plant development.Plant Cell 26: 56–78

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G,Abecasis G, Durbin R (2009) The Sequence Alignment/Map format andSAMtools. Bioinformatics 25: 2078–2079

Li X, Rao S, Jiang W, Li C, Xiao Y, Guo Z, Zhang Q, Wang L, Du L, Li J,et al (2006) Discovery of time-delayed gene regulatory networks basedon temporal gene expression profiling. BMC Bioinformatics 7: 26

Li X, Zhang H, Ai Q, Liang G, Yu D (2016a) Two bHLH transcriptionfactors, bHLH34 and bHLH104, regulate iron homeostasis in Arabidopsisthaliana. Plant Physiol 170: 2478–2493

Li Y, Chen H, Zheng J, Ngom A (2016b) The max-min high-order dynamicBayesian network for learning gene regulatory networks with time-delayed regulations. IEEE/ACM Trans Comput Biol Bioinformatics 13:792–803

Liu L, Xu W, Hu X, Liu H, Lin Y (2016a) W-box and G-box elementsplay important roles in early senescence of rice flag leaf. Sci Rep 6:20881

Liu TL, Newton L, Liu MJ, Shiu SH, Farré EM (2016b) A G-box-like motifis necessary for transcriptional regulation by circadian pseudo-responseregulators in Arabidopsis. Plant Physiol 170: 528–539

Long TA, Tsukagoshi H, Busch W, Lahner B, Salt DE, Benfey PN (2010)The bHLH transcription factor POPEYE regulates response to iron de-ficiency in Arabidopsis roots. Plant Cell 22: 2219–2236

Madar A, Greenfield A, Vanden-Eijnden E, Bonneau R (2010) DREAM3:network inference using dynamic context likelihood of relatedness andthe inferelator. PLoS ONE 5: e9803

Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM,Allison KR, Kellis M, Collins JJ, Stolovitzky G (2012) Wisdom of crowdsfor robust gene network inference. Nat Methods 9: 796–804

Michael TP, Breton G, Hazen SP, Priest H, Mockler TC, Kay SA, Chory J(2008) A morning-specific phytohormone gene expression programunderlying rhythmic plant growth. PLoS Biol 6: e225

Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q (2008) Gene-MANIA: a real-time multiple association network integration algorithmfor predicting gene function. Genome Biol (Suppl 1) 9: S4

Nakashima K, Yamaguchi-Shinozaki K, Shinozaki K (2014) The tran-scriptional regulatory network in the drought response and its crosstalk inabiotic stress responses including drought, cold, and heat. Front Plant Sci 5:170

Nozue K, Harmer SL, Maloof JN (2011) Genomic analysis of circadianclock-, light-, and growth-correlated genes reveals PIF5 as a modulatorof auxin signaling in Arabidopsis. Plant Physiol 156: 357–372

Nuruzzaman M, Sharoni AM, Kikuchi S (2013) Roles of NAC transcrip-tion factors in the regulation of biotic and abiotic stress responses inplants. Front Microbiol 4: 248

Oh E, Zhu JY, Bai MY, Arenhart RA, Sun Y, Wang ZY (2014) Cell elon-gation is regulated through a central circuit of interacting transcriptionfactors in the Arabidopsis hypocotyl. eLife 3: e03031

Oh E, Zhu JY, Wang ZY (2012) Interaction between BZR1 and PIF4 inte-grates brassinosteroid and environmental responses. Nat Cell Biol 14:802–809

O’Malley RC, Huang SSC, Song L, Lewsey MG, Bartlett A, Nery JR, GalliM, Gallavotti A, Ecker JR (2016) Cistrome and epicistrome featuresshape the regulatory DNA landscape. Cell 165: 1280–1292

Para A, Li Y, Marshall-Colón A, Varala K, Francoeur NJ, Moran TM,Edwards MB, Hackley C, Bargmann BOR, Birnbaum KD, et al (2014)Hit-and-run transcriptional control by bZIP1 mediates rapid nutrientsignaling in Arabidopsis. Proc Natl Acad Sci USA 111: 10371–10376

Peng J, Richards DE, Hartley NM, Murphy GP, Devos KM, Flintham JE,Beales J, Fish LJ, Worland AJ, Pelica F, et al (1999) ‘Green revolution’genes encode mutant gibberellin response modulators. Nature 400: 256–261

Petesch SJ, Lis JT (2008) Rapid, transcription-independent loss of nucleo-somes over a large chromatin domain at Hsp70 loci. Cell 134: 74–84

Pfeiffer A, Shi H, Tepperman JM, Zhang Y, Quail PH (2014) Combina-torial complexity in a transcriptionally centered signaling hub in Ara-bidopsis. Mol Plant 7: 1598–1618

Rensing SA (2014) Gene duplication as a driver of plant morphogeneticevolution. Curr Opin Plant Biol 17: 43–48

Roldán I, Wattebled F, Mercedes Lucas M, Delvallé D, Planchot V, Jiménez S,Pérez R, Ball S, D’Hulst C, Mérida A (2007) The phenotype of soluble starchsynthase IV defective mutants of Arabidopsis thaliana suggests a novel func-tion of elongation enzymes in the control of starch granule formation. Plant J49: 492–504

Shiu SH, Shih MC, Li WH (2005) Transcription factor families have muchhigher expansion rates in plants than in animals. Plant Physiol 139: 18–26

Song L, Huang SC, Wise A, Castanon R, Nery JR, Chen H, Watanabe M,Thomas J, Bar-Joseph Z, Ecker JR (2016) A transcription factor hierar-chy defines an environmental stress response network. Science 354:aag1550

Swarnalatha M, Singh AK, Kumar V (2012) The epigenetic control ofE-box and Myc-dependent chromatin modifications regulate the li-censing of lamin B2 origin during cell cycle. Nucleic Acids Res 40: 9021–9035

Taylor-Teeples M, Lin L, de Lucas M, Turco G, Toal TW, Gaudinier A,Young NF, Trabucco GM, Veling MT, Lamothe R, et al (2015) AnArabidopsis gene regulatory network for secondary cell wall synthesis.Nature 517: 571–575

Toledo-Ortiz G, Johansson H, Lee KP, Bou-Torrent J, Stewart K, Steel G,Rodríguez-Concepción M, Halliday KJ (2014) The HY5-PIF regulatorymodule coordinates light and temperature control of photosyntheticgene transcription. PLoS Genet 10: e1004416

Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splicejunctions with RNA-seq. Bioinformatics 25: 1105–1111

Wang M, Verdier J, Benedito VA, Tang Y, Murray JD, Ge Y, Becker JD,Carvalho H, Rogers C, Udvardi M, et al (2013) LegumeGRN: a generegulatory network prediction server for functional and comparativestudies. PLoS ONE 8: e67434

Williams ME, Foster R, Chua NH (1992) Sequences flanking the hexamericG-box core CACGTG affect the specificity of protein binding. Plant Cell4: 485–496

Yalamanchili HK, Yan B, Li MJ, Qin J, Zhao Z, Chin FYL, Wang J (2014)DDGni: dynamic delay gene-network inference from high-temporal datausing gapped local alignment. Bioinformatics 30: 377–383

Plant Physiol. Vol. 175, 2017 639

G-Box Regulatory Code

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Page 13: The G-Box Transcriptional Regulatory - Plant physiology · Breakthrough Technologies The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN] Daphne Ezer,a Samuel J.K. Shepherd,a

Zentner GE, Henikoff S (2012) Surveying the epigenomic landscape, onebase at a time. Genome Biol 13: 250

Zhang LY, Bai MY, Wu J, Zhu JY, Wang H, Zhang Z, Wang W, Sun Y,Zhao J, Sun X, et al (2009) Antagonistic HLH/bHLH transcriptionfactors mediate brassinosteroid regulation of cell elongation and plantdevelopment in rice and Arabidopsis. Plant Cell 21: 3767–3780

Zhang Y, Mayba O, Pfeiffer A, Shi H, Tepperman JM, Speed TP, QuailPH (2013) A quartet of PIF bHLH factors provides a transcriptionallycentered signaling hub that regulates seedling morphogenesis through

differential expression-patterning of shared target genes in Arabidopsis.PLoS Genet 9: e1003244

Zhou T, Shen N, Yang L, Abe N, Horton J, Mann RS, Bussemaker HJ,Gordân R, Rohs R (2015) Quantitative modeling of transcription factorbinding specificities using DNA shape. Proc Natl Acad Sci USA 112:4654–4659

Zoppoli P, Morganella S, Ceccarelli M (2010) TimeDelay-ARACNE: re-verse engineering of gene networks from time-course data by an infor-mation theoretic approach. BMC Bioinformatics 11: 154

640 Plant Physiol. Vol. 175, 2017

Ezer et al.

www.plantphysiol.orgon June 17, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.