Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Edinburgh Research Explorer
Conserved temporalorderingof promoter activation implicatescommon mechanisms governing the immediate early responseacross cell types andstimuli
Citation for published version:Vacca, A, Itoh, M, Kawaji, H, Arner, E, Lassmann, T, Daub, C, Carninci, P, Forrest, ARR, Hayashizaki, Y,FANTOM consortium, T, Aitken, J & Semple, C 2018, 'Conserved temporalorderingof promoter activationimplicates common mechanisms governing the immediate early response across cell types andstimuli',Open Biology. https://doi.org/10.1098/rsob.180011
Digital Object Identifier (DOI):10.1098/rsob.180011
Link:Link to publication record in Edinburgh Research Explorer
Document Version:Peer reviewed version
Published In:Open Biology
General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and / or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights.
Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation. If you believe that the public display of this file breaches copyright pleasecontact [email protected] providing details, and we will remove access to the work immediately andinvestigate your claim.
Download date: 01. Jan. 2021
1
Title: Conserved temporal ordering of promoter activation
implicatescommonmechanismsgoverningtheimmediateearly
responseacrosscelltypesandstimuli
Shorttitle:Promoteractivationintheimmediateearlyresponse
Annalaura Vacca1, Masayoshi Itoh2, Hideya Kawaji3, Erik Arner4, Timo
Lassmann5, CarstenO. Daub6, Piero Carninci4, Alistair R. R. Forrest7, Yoshihide
Hayashizaki2,theFANTOMConsortium^,StuartAitken1,Colin A. Semple1*
1MRCHumanGeneticsUnit,MRC InstituteofGeneticsandMolecularMedicine,
UniversityofEdinburgh,CreweRoad,Edinburgh,EH42XU,UK
2RIKEN Preventive Medicine and Diagnosis Innovation Program, 2F Main
ResearchBuilding,2-1Hirosawa,Wako,Japan
3RIKENAdvancedCenterforComputingandCommunication,RIKENYokohama
Campus,Yokohama,230-0045,Japan
4RIKEN Center for Life Sciences Technologies, RIKEN Yokohama Campus,
Yokohama,230-0045,Japan
5Telethon Kids Institute, The University of Western Australia, Roberts Road,
Subiaco,WesternAustralia,Australia
6Department of Biosciences and Nutrition, Karolinska Institutet, 141 86,
Stockholm,Sweden
7Harry Perkins Institute of Medical Research, 6 Verdun St, Nedlands,Western
Australia6009,Australia
2
*Corresponding Author
Email:[email protected]
^See Acknowledgements for further information on the FANTOM Consortium.
3
Abstract
Thepromotersofimmediateearlygenes(IEGs)arerapidlyactivatedinresponse
to an external stimulus. These genes, also known as primary response genes,
havebeenidentifiedinarangeofcelltypes,underdiverseextracellularsignals,
andusingvaryingexperimentalprotocols.Whereasgenomicdissectiononacase
bycasebasishasnotresultedinacomprehensivecatalogueofimmediateearly
genes, a rigorous meta-analysis of eight genome-wide FANTOM5 CAGE (Cap
AnalysisofGeneExpression)timecoursedatasetsrevealssuccessivewavesof
promoter activation in IEGs, recapitulating known relationships between cell
types and stimuli: We obtain a set of 57 (42 protein-coding) candidate IEGs
possessing promoters that consistently drive a rapid but transient increase in
expressionovertime.ThesegenesshowsignificantenrichmentforknownIEGs
reported previously, pathways associated with the immediate early response
and include a number of non-coding RNAs with roles in proliferation and
differentiation.Surprisingly,wealsofindstrongconservationoftheorderingof
activationforthesegenes,suchthat77pair-wisepromoteractivationorderings
are conserved. Using the leverage of comprehensive CAGE time series data
acrosscelltypeswealsodocumenttheextensivealternativepromoterusageby
suchgenes,which is likely tohavebeenabarrier to theirdiscoveryuntilnow.
The common activation ordering of the core set of early-responding geneswe
identifymayindicateconservedunderlyingregulatorymechanisms.Incontrast,
theconsiderablylargernumberoftransientlyactivatedgenesthatarespecificto
eachcelltypeandstimulusillustratesthebreadthoftheprimaryresponse.
4
Introduction
Human cells respond to a broad range of extracellular stimuli with a
characteristic burst of transcription within minutes at many sites across the
genome, known as the immediate-early response (IER). The IER has been
observed as an initiating event in many cellular processes, notably during
differentiation,inresponsestocellularstress,andininflammation.Theearliest
events in the IER involve the activationof thepromoters of a particular set of
genes, known as immediate-early genes (IEGs). The promoters of IEGs are
activatedrapidly,andtheiractivation is transient innormalcells [1].However,
IEGs are often dysregulated in cancers where they can become continuously
activated, accordingly someof thebest-studied IEGsareknownoncogenes [2].
For example, theexpressionof theFOSproto-oncogenenormallypeakswithin
60minutesofastimulusandsubsidesafter90minutes[3], incontrastwithits
continuousoverexpressioninmanycancers.
Immediate-earlygenespossessunusuallyaccessiblepromotersthatallowrapid
transcriptionalactivation inresponsetoastimuluswithout therequirementof
denovoproteinsynthesis.VariousfeaturesarethoughttodiscriminateIEGssuch
astheshortertranscriptstheygenerateandenrichmentsofcertaintranscription
factor (TF) binding sites at their promoters [4]. However, current knowledge
aboutIEGsisderivedmainlyfromstudiesofindividualgenesorpathways,and
often considers a specific cell type and stimulus. Thismeans that comparison
acrossstudiescanbeconfoundedbyexperimentalandtechnicalvariation,anda
comprehensive catalogue of IEGs remains elusive. There is also controversy
abouttheregulatorymechanismsgoverningtheresponseofevenrelativelywell-
5
studied IEGs [5]. Beyond the induction of protein-coding IEG promoters, the
featuresandunderlyingmechanismsof the IERareeven lesswellunderstood.
Some studies have implicated altered patterns of IEG splicing as playing
importantrolesintheIER[6],whileothershavesuggestedaprominentrolefor
lncRNA activation [7] and transcribed enhancers [8]. Approximately 20% of
knownIEGsaretranscriptionfactors, includingsomeofthebestcharacterized:
EGR1-EGR4,FOS,FOSB,FOSL1,JUN,JUNBandMYC.
TheFANTOM5CapAnalysisofGeneExpression(CAGE)dataofferanumberof
advantages for expression profiling since they are based upon single-molecule
sequencingtoavoidPCR,digestionandcloningbiases.Theyprovideuptosingle
basepairresolutionoftranscriptionstartsites(TSSs)andpromoterregions,and
provideasensitive,quantitativereadoutoftranscriptionaloutputaccountingfor
thealternativepromotersofeachgene.Theoutputofindividualpromotersisnot
confounded by splicing variation, andmany novel lowly expressed transcripts
including non-coding RNAs (ncRNAs) can be readily detected (FANTOM
Consortium, Nature, 2014; http://fantom.gsc.riken.jp/5/). CAGE data are thus
ideallysuitedtostudyingthestrongburstoftranscriptionatpromotersseenin
immediate-early responses. FANTOM5 data include eight CAGE time course
datasetsemployingunusuallydensesamplingat timepointswithin300minof
stimulation, for a variety of stimuli treating a variety of cell types. These
heterogeneous datasets, produced using a common experimental platform,
should be fertile ground for novel insights into the IER, but a comprehensive
meta-analysishasnotbeenperformeduntilnow.
6
Manypreviousapproachestotimeseriesanalysisofexpressiondatahavebeen
based upon differential expression between successive time points, or have
clusteredgenesaccordingtothesimilarityoftheirexpressionprofilesovertime
[9]. Both of these approaches present problems for the analysis of CAGEdata.
Differentialexpressionbetweentimepointsprovidespoorsensitivity for lowly
expressed transcripts (possessing too few reads to generate significant
differences in expression), and presents serious difficulties when comparing
expressionprofilesfromdatasetswithsomewhatdifferentsamplingpointsover
time. Clustering approaches often rely upon arbitrary thresholds (e.g. based
uponclustersizeorsignificantenrichmentoffunctionalannotationterms)and,
bydefinition,willmisstranscriptsthatcannotbeassignedtoaclusterbutmay
neverthelessshowdynamicsofinterest.Hencewerefineapreviouslysuccessful
Bayesian model selection algorithm to classify promoter responses to pre-
definedmathematicalmodels[7].
Hereweperformextensivemeta-analysesofpromoteractivityinthehumanIER,
encompassingunusuallydiversecelltypesandstimuli,torigorouslyclassifyIEGs
andestimatethecoreIEGrepertoireactiveacrosscellularresponses.Weshow
thatcomputationalclassificationofthetemporalactivitypatternsofpromoters
provides a potent basis for meta-analyses across time courses, exposing the
combinedactivityofknownIEGsandcompellingnewIEGcandidatesintheIEG
corerepertoire.Wealsoshowthatthetimingofthepeakexpressionofacoreset
of transiently activated genes has a conserved order. This surprising outcome
indicatesapreviouslyunidentifiedregulatorymechanismthat issharedamong
celltypesandcommontodiversestimuli.
7
Results
Weconsideredeightdenselysampled,andwellreplicated,FANTOM5CAGEtime
course datasets obtained following diverse stimuli: calcification in an
osteosarcomacelllineinresponsetoosteocalcin(SAOS2_OST),differentiationof
adipose-derivedprimarymesenchymalstemcellsinresponsetoadrugmixture
(3-isobutyl-1-methylxanthine, dexamethasone, and rosiglitazone) (PMSC_MIX),
differentiation of primary lymphatic endothelial cells in response to VEGF
(PEC_VEGF),MCF7breastcancercell lineresponsestoEGF1(MCF7_EGF1)and
to HRG (MCF7_HRG), primary aortic smooth muscle cells response to IL1b
(PAC_IL1B)andFGF2(PAC_FGF2),andprimarymonocyte-derivedmacrophage
cellsactivation inresponsetoLPS(PMDM_LPS).Thus,we includedavarietyof
primaryandcell linesamples, trackingresponses toarangeofstimuli:growth
factors, hormones, drugs, pro-inflammatory cytokines and bacterial endotoxin
(Figure 1a). These diverse data provided a potent resource to discover core
featuresoftheIERconservedacrosscelltypesandstimuli.AllTSSsforprotein-
coding transcripts were represented by conservatively selected CAGE read
clusters (at least 10 TPM) followingArner et al. (2015) [10]. As expected, the
responses of known immediate early genes often showed characteristic
expressionpeaksearly in the time seriesdatasets - as exemplifiedbyFOSand
JUN – though even for these well-established IEGs we observed substantial
variation in themagnitude, timing anddurationofpeaks across cell types and
stimuli(Figure1b).TheseobservationsillustratethechallengespresentedinIEG
detection, even when studying known IEGs using a uniform experimental
platform.
8
OptimisingandrefiningtheapproachdevelopedbyAitkenetal(2015)[7](see
Methods), we defined four mathematical models representing archetypical
expressionprofilesofinterestovertime:peak,linear,dipanddecay(electronic
supplementarymaterial, Figure S1), and assessed the fit of eachmodel to the
expressionprofileofeachgeneusingnestedsamplingtocomputethemarginal
likelihood, log Z [7]. Where sufficient evidence exists (given the variation
betweenreplicates)thealgorithmreturnsaclassificationofaninputtranscript
toamodel,andalsocomputesrelevantparametersofthefittedmodels(e.g.time
and magnitude of peak expression). These parameter estimates provide a
reliable basis for comparisons across time series datasets, evenwith different
samplingdensities[7]astheyarenotrestrictedtosamplingtimesorexpression
valuesatthosetimes.
CAGEtimeseriesmeta-analysisrevealsacorecomplementoftransiently
activatedpromoters
Across the eight time series datasets we considered all CAGE clusters
corresponding to the TSSs of known Ensembl [11] transcripts, encompassing
between 10,513 (corresponding to 7,706 Ensembl genes) and 14,376 (8,951
genes)protein-codingCAGETSSs,dependingonthedataset,andbetween1,202
(692 genes) and 1,640 (858 genes) non-coding RNA CAGE TSSs (electronic
supplementary material, Table S1). Between 15% and 42% of protein-coding
CAGE TSSs, and between 15% and 33% of non-coding TSSs were confidently
classified to one of the four models, depending on the dataset (electronic
supplementary material, Figure S2, Table S2). The remainder could not be
rigorously classified toa singlemodel andwereomitted from furtheranalysis.
9
Thepeakmodelhad thehighestnumberof assignments in all thedatasets for
both protein-coding and ncRNA genes; for example, of 12,132 total Ensembl
protein-codinggenestested,wefound8,785Ensemblgenes(72%)topeakinat
leastoneofthedatasets.Incontrast,fewgeneswereclassifiedtothepeakmodel
in multiple datasets, with only 42 of such genes shared across at least seven
datasets(Figure1c),underliningthehighvariabilityoftranscriptionalresponses
seenforthesamepromotersacrosstimeseries.These42genesconstitutedour
‘robust’ set of candidate IEGs (genes, TSSs and peak times listed in electronic
supplementarymaterial, File S1).We also defined a less stringent ‘permissive’
setof1,304candidatessharedacrossatleastfouroutofeightdatasets.
We then explored the overlap in peaking genes outside of the robust set
(electronicsupplementarymaterial,TableS3)andfoundthat,foreachdataset,at
least8%ofpeakinggenesaresharedwithanotherdataset(range8%-16%)and
up to 52% of peaking genes are shared (range 19%-52%). The intersections
betweensetsof3datasetsbecamesmallerconsequently.Notably,approximately
50% of peaking genes are shared between datasetswhere the cell type is the
same(MCF7andPAC).
Ourmodelfittingapproachprovidedparameterestimatesforallpromoters
assignedtothesamemodel,providingastraightforwardandintuitivebasisfor
meta-analysis.Forexample,comparisonofthepeaktimes(tp)(Figure2a)forall
promotersclassifiedaspeaksinatleastfourdatasets(thepermissiveset)
readilydemonstratedcommonpatternsacrossdatasets(Figure2b).Wavesof
promoteractivationwereevident,withcertainpromoters,particularlyknown
IEGs,activatedinthesameearlytimewindowinmultipledatasets.Hierarchical
clusteringofthedatasetsbasedonthesepeakclasspromoters(9%ofall
10
promotersassayed)alsorecapitulatedknownrelationshipsbetweencelltypes
andstimuli(Figure2b).Thetwodatasetsderivedfromthesamebreastcancer
cellline(MCF7_EGF1andMCF7_HRG)andstimulatedwithdifferentligandsof
thesameErbBreceptorfamilyclusteredtogetherasmightbeexpected.We
observedsimilarbehaviourforthetwoprimaryaorticcellsamplesexposedtoa
growthfactororactivatedbyapro-inflammatorycytokine(PAC_FGF2and
PAC_IL1Brespectively).Thus,similaritiesinpromoteractivationdynamics
(reflectedintpparameterestimates)betweendatasetsmayreflectunderlying
commonalitiesintheirunderlyingbiology.
The extent of alternative promoter usage across the robust set of IEGs and
candidateIEGsisshowninFigure3(seealsoelectronicsupplementarymaterial,
Figure S3). Candidate IEGs show slightly greater variability in the TSSs they
activate across datasets compared with known IEGs, with a greater median
numberTSS found topeak(3.5comparedwith2 forknown IEGs). Inaddition,
known IEGs tend to possess TSSs that are successfully classified to the peak
modelacrossalargernumberofdatasets(meanproportionofdatasetsclassified
as peak per TSS for known IEGs in the robust set = 4; candidate IEG mean
proportion = 2.5). Thus, known IEGs tend to possess smaller numbers of
alternativeTSSsthatalsotendtoshowdiscerniblepeaks inthemajorityofthe
time series datasets. It is possible that these relatively stereotypical
transcriptional characteristics of known IEGs may, in some cases, have led to
theirstatusaswellestablishedIEGs.Similarly,theincreasedvariabilityseenfor
the TSSs of candidate IEGs could have led to a failure to detect their IEG-like
behaviourinformerstudies.
11
We investigated the nature of our promoter classifications by testing the
enrichmentofknownIEGs(seeMethods)withineachclass,foreachdataset.The
peak class was enriched for known IEGs in all datasets (electronic
supplementary material, Figure S4, Table S4), but failed to reach statistical
significance in PMSC_MIX (OR = 1.3, p = 0.2). Peaking genes shared across
datasetsweregenerallyassociatedwithsignificantenrichmentsofknownIEGs
(Table1),withthepermissiveset(sharedacross4ormoredatasets)expectedto
contain higher numbers of false positives than the robust set (7 or more
datasets).GenespossessingTSSsassignedtothepeakclassshowedenrichments
forGeneOntology(GO)processesassociatedwith transcription,cellactivation,
cellproliferation,celldifferentiationandcancerrelatedtermssuchascelldeath
andapoptosis(FDR<0.05;Methods)[12,13].Thesetermswerealsoconsistent
withpreviousstudiesofIEGs[4]asgenesintherobustsetshowedenrichment
for285GOterms,over30%(88)ofwhichweresharedwiththelistof773GO
termsofallknownIEGs(electronicsupplementarymaterial,TableS5).
Table1.EnrichmentofknownIEGsforgenesclassifiedtothepeakmodelinmultipledatasets
Shareddatasets
IEGsenrichment #CAGETSSs(median)#
genes#IEGs
#CAGETSSs(across8datasets)
#IEGCAGETSSs(across8datasets)
OR p
1to8(allpeakinggenes)
8,785 204 102,496 913 - - 1
2to8 5,270 171 71,384 853 6.3 2.2e-16 13to8 2,882 128 45,360 751 5.9 2.2e-16 24to8 1,304 86 24,616 590 5.9 2.2e-16 25to8 507 56 11,528 433 7.4 2.2e-16 36to8 182 35 4,896 299 10.3 2.2e-16 37to8 42 13 1,376 124 12.6 2.2e-16 48 5 2 264 18 8.3 4.6e-11 5Enrichment (expressedasodds ratios)andpvalues forgenesclassifiedacrossdifferentnumbersoftimeseriesdatasets.
12
Novelnon-codingRNAcandidatesintheimmediateearlyresponse
Wenextappliedour classification topromotersdriving theexpressionofnon-
coding transcripts and found peak promoters driving the expression of 20
ncRNA genes (across at least seven datasets), constituting the robust set of
ncRNA candidate IEGs. These included promoters associated with the cellular
splicingmachinery,suchassmallnuclearRNAmulti-genefamilies(U1,U2,and
U4), which are part of the spliceosome, and SCARNA17, a small nuclear RNA
which contributes to the post transcriptional modification of many snRNPs.
Kalam et al. have shown that macrophage infection with Mycobacterium
tuberculosisresultsinthesystematicperturbationinsplicingpatterns[14],and
our results suggest more general roles for alternative splicing in the IER.
However, multigene families, such as these small nuclear RNAs, present
particularchallengesforreliablesequencereadmapping.Althoughprobabilistic
approaches to mapping ambiguously mapped reads were developed in
FANTOM5[10]wehavechosentoconservativelyremovethesegenesfromthe
robustset, leavingagroupof15non-codinggeneswithamedianof5peaking
TSS(Table2;electronicsupplementarymaterial,FigureS5andS6).
ThreemiRNAs are present in the robust set (Table 2) including the oncogene
miR-21 which was previously reported to show IEG-like behaviour in the
PAC_FGF2, PAC_IL1B and MCF7_HRG time series [7]. Here we find similar
behaviour in the MCF7_EGF1, PEC_VEGF, PMSC_MIX and SAOS2_OST datasets.
This extends previous studies reporting that the miR-21 mature transcript is
upregulatedonEGFtreatmentinMCF10A[15]andHeLa[16]cells.miR-29Ahas
13
been associated with the viability and proliferation of mesenchymal stem cell
andgastriccancercells[17,18]andDLEU2isaputativetumoursuppressorgene
thathoststwomiRNAs,miR-15AandmiR-16-1whichareknowntoinhibitcell
proliferationand the colony-formingabilityof tumour cell lines, and to induce
apoptosis [19-21]. Seven lncRNAs also appear in the robust set (Table 2) and
among them LINC00478 is particularly interesting, as it has already been
reportedtoshowIEG-likebehaviour[7],isimplicatedinbreastcancerandhosts
an intronic cluster ofmiRNAs comprising let-7c,miR-99a, andmiR-125b [22].
Although poorly characterised, LINC00263, LINC-PINT and LINC00963 are
thought tobe involved inbiological processesoften triggeredby IEGs, such as
cellmaturation,cellproliferationandtheexpressionofgrowthfactorreceptors
[23-26].
14
Table2.Non-codinggenespeakinginatleast7outof8datasets.
GeneID NºSharedDatasets
Description(PuMedref.)
LINC00478(MIR99AHG)
7 Ithasaroleincellproliferationanddifferentiationanditisconsideredaregulatorofoncogenesinleukemia(PMID:25027842)
LINC00263 7 Regulationofoligodendrocytematuration(PMID:25575711)
LINC-PINT 8 Putativetumorsuppressor(PMID:24070194)LINC00963 7 Involvedintheprostatecancertransition
fromandrogen-dependenttoandrogen-independentandmetastasisviatheEGFRsignalingpathway(PMID:24691949)
LINC00476 8 UncharacterizedlincRNALINC00674 7 UncharacterizedlincRNASTX18-AS1 7 UncharacterizedlincRNADLEU2 7 Criticalhostgeneofthecellcycleinhibitory
microRNAsmiR-15aandmiR-16-1(PMID:19591824)
MiR-29A 7 TheexpressionofthemiR-29familyhasantifibroticeffectsinheart,kidney,andotherorgans.miR-29shavealsobeenshowntoinduceapoptosisandregulatecelldifferentiation(PMID:22214600)
MiR-3654 7 InvolvedinProstateCancerprogression(PMID:27297584)
MiR-21 7 Oncogenicpotential(PMID:18548003)AL928646 7 UncharacterizedncRNASCARNA17 7 scaRNAInvolvedinthematurationofother
RNAmolecules(PMID:12032087)SNORD65 7 BelongstotheSmallnucleolarRNAs,C/D
family.InvolvedinrRNAmodificationandalternativesplicing(PMID:26957605)
SNORD82 7 BelongstotheSmallnucleolarRNAs,C/Dfamily.InvolvedinrRNAmodificationandalternativesplicing(PMID:26957605)
Theshortdescriptionsofthemolecularfunctionarefromthegenecarddatabase[27].
KnownIEGpromotersshowconservedtemporalorderofactivationacross
datasets
Having established common patterns of peak gene induction at similar times
acrossdatasets(Figure2b),wehypothesisedthatIEGsmayalsobeinducedina
conservedorderovertime.Toourknowledgetheextentofconservedordering
15
ingeneinductionisunstudiedingeneral,andintheIERitisofparticularinterest
for two main reasons. Firstly, the presence of conserved gene orderings, in
addition to common gene classifications, provides an additional test for
functional similarity between datasets. Secondly, strongly conserved ordering
may suggest the existence of conserved regulatorymechanisms governing the
inductionof thesegenes.Toanalyse the relativeorderof activationacross the
eightdatasetswecomparedthepeaktimeofeachgenetothatofallothersinthe
peakclass.Iftherelativetemporalorderoftwogeneswasconservedinatleast
sevenoftheeightdatasetstheorderingforthispairwasconsideredconserved
andrepresentedbyanedgeintheconservedactivationnetwork(Figure4).
We found 77 pairs of genes showing conserved ordering in their activation,
involving 40 of the 57 genes in the robust set. FOS was the first gene to be
activated (lacking a predecessor in the ordering) and SDC4, EHD1 and
TMEM185B were the last. The number of conserved temporal connections
observed overall is statistically significant (p < 5e-3) by comparison to the
distribution of expected connections given 1,000,000 permuted datasets
(Methods). This appears to reflect a conserved coordination in promoter
activationduring the IERand furthersupports thecandidacyof thenovel IEGs
detected.Manygenes in thisnetworkareknown toparticipate inwell-studied
pathways active in the IER such as the MAPK signalling pathway as we now
discuss.
KnownIEGsandcandidateIEGsparticipateincommonsignallingpathways
HavingshownthatthepeakmodeldescribedthebehaviourofknownIEGs,we
speculated that the other genes assigned to this model might include novel
16
candidateIEGs.Ofthe42genesintherobustsetmorethantwothirds(29genes)
arenotknowntobeIEGsandcan,therefore,beconsideredtobecandidatenovel
IEGs(henceforthcandidate IEGs).Pathwayanalysis [28]recoversmanyknown
relationships among known IEGs as expected, centred on heavily studied IEGs
suchasFOSand JUN.However, thesameanalysissuggests thatmore thanhalf
(17)of candidate IEGsalsoparticipate incommonpathwayswithknown IEGs,
involving a densely inter-connected network of 83 significantly over-
representedpathways (electronicsupplementarymaterial,TableS6), including
signalling cascades known to mediate the IER, such as the Ca2+-dependent
pathwaysandthemitogen-activatedprotein(MAP)kinasenetwork[29,30].
Thedynamicsof theexpressionofpeak-classifiedgenescanbevisualizedbya
scatterplotoffoldchangeagainstpeaktime(electronicsupplementarymaterial,
Figure S7). These quantitative features along with the conserved temporal
orderings described above showFOS as the earliest peaking IEG, EHD1 as the
last, with an array of conserved orderings subsequent to, and prior to the
peaking of these genes respectively (selected genes plotted in Figure 5). The
TSSs of known IEGs CAGE tend to show the greatest fold changes (electronic
supplementary material, Figure S8A, Wilcoxon p < 2.2e-16), however, some
candidate IEGs promoters show notably similar timing (electronic
supplementarymaterial,FigureS8C,Wilcoxonp=0.89).Thetimeofpeakingis
significantly earlier for known IEGs relative to the other protein-coding
promoters in only three time series: PMDM_LPS, MCF7_EGF1 and PEC_VEGF.
Fold changes inpeakncRNApromoters tend tobe lower than forknown IEGs
(electronic supplementary material, Figure S8B, Wilcoxon p < 0.05) but they
17
occurearlier thanknownIEGs(electronicsupplementarymaterial,FigureS8D,
Wilcoxonp<0.05foralldatasets).
AmongthecandidateIEGsintherobustset,XBP1isespeciallynoteworthy.This
gene encodes a transcription factor and is relatively short in length (6Kb
compared with the mean of 58Kb for all Ensembl protein-coding genes)
consistentwiththeIEGarchetype[1].XBP1isahighlyconservedcomponentof
the Unfolded Protein Response (UPR) signalling pathways, activated by
unconventionalsplicinguponEndoplasmicReticulum(ER)stressornonclassical
anticipatoryactivation[31-33],andregulatesadiversearrayofgenes involved
in ER homeostasis, adipogenesis, lipogenesis and cell survival [34, 35].
Interestingly,genes intherobustsetaresignificantlyenrichedfortheGOterm
GO:003497responsetoendoplasmicreticulumstress(FDR<0.05,alltestedgenes
asthebackground),andfourofthefivegenesintherobustsetsharingthisterm
peakinconservedorderacrossthedatasets.Furthermore,wefoundasignificant
enrichment(FDR<0.05)oftheXBP1bindingmotifinthepromoterregions(see
Methods)of therobustsetofgenes(electronicsupplementarymaterial,Figure
S9).
Discussion
Exploiting the precision of FANTOM5 CAGE times series data, we discover a
robust set of 42protein-coding genesdrivenbypromoters showing rapid and
transient activation in response to multiple stimuli. This set contains 13
previously known IEGs and 29 candidate IEGs, which are likely to be core
componentsoftheIER.ApplyingourapproachtotheCAGETSSsofncRNAswe
18
also discovered a set of 15 ncRNAs peaking across at least seven datasets,
comprising miRNAs and lncRNAs, suggesting regulatory roles for particular
miRNAsandlncRNAsspeciesintheIER[7].
FOSexpressionhas longbeen considered to lead the IERafter cell stimulation
[36,37],ourresultson the IERconservedactivationnetworksupport this,but
alsosimilarlyconservedrelationshipsextendingtoanadditional39codingand
non-codinggenes.Furthermore,weobservedmanyknownandnovelIEGsinthis
networkknowntobeinvolvedinarangeofsignallingpathwaysactiveintheIER,
such as the MAPK and the EGF/EGFR signalling pathways. This suggests the
variable constellations of genes involved in the IER to any particular stimulus
maybeunderpinnedbyadeeper levelof conservation in the regulationof the
IERacrossstimuli.
One of themost interesting candidate IEGs, XBP1, can be rapidly activated by
alternative splicing minutes after cell stimulation with mitogenic hormones,
activating peptides such as LPS and cytokines [31-33]. This key event of the
induced unfolded protein response (UPR) pathway is a conserved eukaryotic
response to cellular stress, and is thought to cooperate in the regulation of
immediateearlygeneexpression[32].However,thedynamicsofXBP1promoter
induction in the context of the IER have not been studied previously.
InterestinglywefoundasignificantenrichmentforXBP1TFbindingsitesinthe
promoter regions of 11 genes in the IER conserved activated network. The
presenceofXBP1andXBP1-respondinggenesinthisnetworksuggeststhisgene
mayactasanimportantlinkbetweentheIERandtheUPRpathway.
19
Materialsandmethods
Datasets
The eight datasets used (Figure 1) are themost densely sampled human time
series produced by the FANTOM5Project,with all time points represented by
threereplicates[38].Detailedinformationonthegenerationofthesedatasetsis
available from Arner et al. (2015) [10], including CAGE library preparation,
quality control, sequencing and qRT-PCR validation, as well as protocols for
CAGEreadclusteringandTSSdetection.AllCAGEclustersrepresentingTSSsof
protein-coding genes were conservatively thresholded to >10 TPM (tags per
million),while CAGE clusters corresponding toncRNAwere thresholded to>2
TPM, allowing for their generally lower expression levels. FANTOM5 data
downloads,browsersandgenomic toolsareavailable fromtheprojectwebsite
(http://fantom.gsc.riken.jp/5/).
Model-basedclassificationofTSSexpressionprofiles
Toclassify time-seriesdata foreachCAGEdefinedTSSwerefinedapreviously
published method [7] which fits different mathematical models (kinetic
signatures)to individualexpressionprofiles,assessingthebest fitusingnested
sampling [39] to compute the marginal likelihood, logZ. All time series were
normalizedsuchthatthemediumminimumandmaximumacrossthetimeseries
wassetto0and10,respectively.
The kinetic signatures considered are: linear, decay, dip and delayed peak
(electronic supplementary material, Figure S1). The peak kinetic signature
considered in the previous method was modified to allow a delay before
expressionstartstoincreaseinexponentialfashion(td).Parametertsisthetime
20
durationoftheinitialincreaseinexpression,p1istheexpressionattime0,and
p2istheincreaseinexpressionatthetimeofpeaking,tp=td+ts.
δ =log 0.3
t*
(1)
y = p-; t ≤ t0
(2)
y = p- + p2 1 − e6 7879 ; t0 < t ≤ t0 + t*
(3)
y = p- + p2 1 − e6 7879 − p2 1 − e6 787987; ; t > t0 + t*
(4)
However, an alternative rateδ = =>? @.-
79was also used to model the slower
dynamicsoftranscriptspeakinglaterintime,andthebestfittingmodelselected
during the decision step. Normalising the data such that expression lies in the
range0-10allowedthepriorrangesofparameterstoberestrictedtoplausible
valuesthatappliedtoalltimeseries.Thefitofmodelstodatawasimprovedasa
result.ToaccountforanyimpactonthelogZcalculation,wegeneratedsynthetic
time series datasets using parameter values drawn at random from the prior
rangestogenerateonereplicate,andgeneratedtwootherreplicatesbyadding
andsubtracting(respectively)agivenamountofnoisetothefirst.Modelfitting
wasapplied to1000suchdatasetspermodel (using the samenoisevalues for
eachmodeloneachof the1000 iterations)andweobservedanadvantage for
eachofthemorecomplexmodelsincomparisonwiththelinearmodelthatwas
consistentovertherangeoflogZvaluesobtainedforthelinearmodel.Tooffset
21
this effect for each complex model, the advantage (mean difference plus two
standarddeviationsobserved in syntheticdata)was subtracted from the logZ
valuescalculatedforCAGETSSdatawhenmakingthecategorisationdecision.
Transcriptionfactorbindingsiteidentification
Weassessedtheenrichmentoftranscriptionfactorbindingsite(TFBS)motifsin
the JASPARdatabase [40] (January2017 release) for all CAGETSSassigned to
genes in the robust set relative to those assigned to the 12,132 genes tested
across all the datasets. Motif matches (FDR ≤ 0.05) were sought in flanking
400bpwindowscentredonthemiddleofeachCAGETSSanalysed),usingFIMO
[41]fromtheMEMEpackage(version4.11.2patch2).Enrichmentofeachmotif
intherobustsetrelativetothetotalsetwasassessedwithFisher’sexacttests,
correctingformultipletesting(FDR≤0.05).
PathwayandGeneOntologyenrichment
Functional and pathway enrichments were assessed using GOrilla [13] and
InnateDB[28]respectively(FDR≤0.05),usingthetotal12,132genesanalysed
acrosstheeightdatasetsasthebackgroundset.
The listof234knownIEGs[10]wasassembled from20publishedhumanand
mousedatasets fromthe literature; it isexpectedtocontain fewfalsepositives
butdoes includeanumberof IEGsonlyreported incellsand/orresponsesnot
examined in this study. To compute the enrichment of known IEGs in each
datasetwecomparedtheproportionofpeakingCAGETSSsassignedtoIEGswith
the proportion of peaking CAGE TSSs assigned to candidate IEGs. For the
enrichmentofknownIEGsineachsetofsharedpeakinggeneswecomparedthe
proportionofpeakingCAGETSSsassigned to the IEGsshared ineachgroupof
sharedgeneswiththeproportionofpeakingCAGETSSsassignedtoIEGsinthe
22
remaining tested genes. The odds ratio and the p-value was assigned using
Fisher’sexacttest.
Networkconservation
A total of 57 protein-coding and non-coding candidate IEGs (corresponding to
known Ensembl genes) were considered for construction of the conserved
activation network. For genes with multiple peaking CAGE TSS we chose the
earliest peaking CAGE TSS (smallest tp) in each dataset, then the relative
pairwiseorderofeachgenewascomputedwithrespecttoalltheothergenesin
therobustset.Forexample,ifindataset-1,gene-Apeaksbeforegene-B(tpgene-A<
tpgene-B), and this order is observed in 6 or more of the other 7 dataset, the
temporalprecedence isdefinedtobeconserved.Applyingthisproceduretoall
57 coding and non-coding genes of the robust set we discovered 40 genes
temporally connected by 77 conserved relative orderings (Figure 4). The
significance of the number of temporal connections observed was measured
relative tonulldistribution, constructedbypermuting tp forall theCAGETSSs
1,000,000times;withtheproportionofpermuteddatasetswithatleastasmany
conserved orderings as the observed taken as an empirically derived p-value.
The observed value (77)was observed or exceeded in 4,516 out of 1,000,000
permutations indicating that the number of temporal connections was
statisticallysignificant(p<5e-3).
Acknowledgments
WethankallmembersoftheFANTOMconsortiumforcontributingtogeneration
of samples and analysis of the FANTOM5 dataset and thank GeNAS for data
production.The coremembersof theFANTOM5phase2projectwere:Alistair
23
R.R. Forrest, Albin Sandelin, Carsten O. Daub, ChristineWells, David A. Hume,
Erik Arner, Hideya Kawaji, Kim M Summers, Kristoffer Vitting-Seerup, Piero
Carninci,RobinAndersson,YoshihideHayashizaki. FinnDrabløs(Departmentof
Clinical and Molecular Medicine, Laboratory Center, Erling Skjalgsons gt. 1,
Faculty ofMedicine andHealth Sciences, NorwegianUniversity of Science and
Technology, N-7006 Trondheim, Norway) curated the list of IEGs within the
FANTOM5consortium.
AuthorsContributions
AVcarriedoutthecomputationalanalysisanddraftedthemanuscript;CASand
SAdesignedthestudyanddraftedthemanuscript;MI,HK,EA,TL,COD,PCand
ARRF generated and processed the CAGE data coordinated by YH. All authors
gavefinalapprovalforpublication.
24
References
1. FowlerT,SenR,RoyAL.Regulationofprimaryresponsegenes.Molecularcell.2011;44(3):348-60.2. HealyS,KhanP,DavieJR.Immediateearlyresponsegenesandcelltransformation.Pharmacology&therapeutics.2013;137(1):64-77.3. GreenbergME,ZiffEB.Stimulationof3T3cellsinducestranscriptionofthec-fosproto-oncogene.Nature.1984;311:433-8.4. BahramiS,DrabløsF.Generegulationintheimmediate-earlyresponseprocess.AdvancesinBiologicalRegulation.2016.5. O'DonnellA,OdrowazZ,SharrocksAD.Immediate-earlygeneactivationbytheMAPKpathways:whatdoanddon'tweknow?:PortlandPressLimited;2012.6. FowlerT,SuhH,BuratowskiS,RoyAL.RegulationofprimaryresponsegenesinBcells.JournalofBiologicalChemistry.2013;288(21):14906-16.7. AitkenS,MagiS,AlhendiAM,ItohM,KawajiH,LassmannT,etal.Transcriptionaldynamicsrevealcriticalrolesfornon-codingRNAsintheimmediate-earlyresponse.PLoSComputBiol.2015;11(4):e1004217.8. AnderssonR,GebhardC,Miguel-EscaladaI,HoofI,BornholdtJ,BoydM,etal.Anatlasofactiveenhancersacrosshumancelltypesandtissues.Nature.2014;507(7493):455-61.9. Bar-JosephZ,GitterA,SimonI.Studyingandmodellingdynamicbiologicalprocessesusingtime-seriesgeneexpressiondata.NatureReviewsGenetics.2012;13(8):552-64.10. ArnerE,DaubCO,Vitting-SeerupK,AnderssonR,LiljeB,DrabløsF,etal.Transcribedenhancersleadwavesofcoordinatedtranscriptionintransitioningmammaliancells.Science.2015;347(6225):1010-4.11. HubbardT,BarkerD,BirneyE,CameronG,ChenY,ClarkL,etal.TheEnsemblgenomedatabaseproject.Nucleicacidsresearch.2002;30(1):38-41.12. SupekF,BošnjakM,ŠkuncaN,ŠmucT.REVIGOsummarizesandvisualizeslonglistsofgeneontologyterms.PloSone.2011;6(7):e21800.13. EdenE,NavonR,SteinfeldI,LipsonD,YakhiniZ.GOrilla:atoolfordiscoveryandvisualizationofenrichedGOtermsinrankedgenelists.BMCbioinformatics.2009;10(1):1.14. KalamH,FontanaMF,KumarD.AlternatesplicingoftranscriptsshapemacrophageresponsetoMycobacteriumtuberculosisinfection.PLoSpathogens.2017;13(3):e1006236.15. AvrahamR,Sas-ChenA,ManorO,SteinfeldI,ShalgiR,TarcicG,etal.EGFdecreasestheabundanceofmicroRNAsthatrestrainoncogenictranscriptionfactors.SciSignal.2010;3(124):ra43.16. LlorensF,HummelM,PantanoL,PastorX,VivancosA,CastilloE,etal.Microarrayanddeepsequencingcross-platformanalysisofthemirRNomeandisomiRvariationinresponsetoepidermalgrowthfactor.BMCgenomics.2013;14(1):371.17. LiuX,CaiJ,SunY,GongR,SunD,ZhongX,etal.MicroRNA-29ainhibitscellmigrationandinvasionviatargetingRoundabouthomolog1ingastriccancercells.Molecularmedicinereports.2015;12(3):3944-50.
25
18. ZhangY,ZhouS.MicroRNA‑29ainhibitsmesenchymalstemcellviabilityandproliferationbytargetingRoundabout1.MolecularMedicineReports.2015;12(4):6178-84.19. CimminoA,CalinGA,FabbriM,IorioMV,FerracinM,ShimizuM,etal.miR-15andmiR-16induceapoptosisbytargetingBCL2.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica.2005;102(39):13944-9.20. LernerM,HaradaM,LovénJ,CastroJ,DavisZ,OscierD,etal.DLEU2,frequentlydeletedinmalignancy,functionsasacriticalhostgeneofthecellcycleinhibitorymicroRNAsmiR-15aandmiR-16-1.Experimentalcellresearch.2009;315(17):2941-52.21. GaoS-m,XingC-y,ChenC-q,LinS-s,DongP-h,YuF-j.miR-15aandmiR-16-1inhibittheproliferationofleukemiccellsbydown-regulatingWT1proteinlevel.JournalofExperimental&ClinicalCancerResearch.2011;30(1):1.22. Gökmen-PolarY,ZavodszkyM,ChenX,GuX,KodiraC,BadveS.AbstractP2-06-05:LINC00478:Anoveltumorsuppressorinbreastcancer.CancerResearch.2016;76(4Supplement):P2-06-5-P2--5.23. MillsJD,KavanaghT,KimWS,ChenBJ,WatersPD,HallidayGM,etal.Highexpressionoflonginterveningnon-codingRNAOLMALINCinthehumancorticalwhitematterisassociatedwithregulationofoligodendrocytematuration.Molecularbrain.2015;8(1):1.24. MüllerS,RaulefsS,BrunsP,Afonso-GrunzF,PlötnerA,ThermannR,etal.Next-generationsequencingrevealsnoveldifferentiallyregulatedmRNAs,lncRNAs,miRNAs,sdRNAsandapiRNAinpancreaticcancer.Molecularcancer.2015;14(1):1.25. Marín-BéjarO,MarcheseFP,AthieA,SánchezY,GonzálezJ,SeguraV,etal.PintlincRNAconnectsthep53pathwaywithepigeneticsilencingbythePolycombrepressivecomplex2.Genomebiology.2013;14(9):1.26. WangL,HanS,JinG,ZhouX,LiM,YingX,etal.Linc00963:Anovel,longnon-codingRNAinvolvedinthetransitionofprostatecancerfromandrogen-dependencetoandrogen-independence.Internationaljournalofoncology.2014;44(6):2041-9.27. StelzerG,RosenN,PlaschkesI,ZimmermanS,TwikM,FishilevichS,etal.TheGeneCardssuite:fromGenedataminingtodiseasegenomesequenceanalyses.CurrentProtocolsinBioinformatics.2016:1.30.1-1..3.28. BreuerK,ForoushaniAK,LairdMR,ChenC,SribnaiaA,LoR,etal.InnateDB:systemsbiologyofinnateimmunityandbeyond—recentupdatesandcontinuingcuration.Nucleicacidsresearch.2012:gks1147.29. SchrattG,WeinholdB,LundbergAS,SchuckS,BergerJ,SchwarzH,etal.Serumresponsefactorisrequiredforimmediate-earlygeneactivationyetisdispensableforproliferationofembryonicstemcells.Molecularandcellularbiology.2001;21(8):2933-43.30. TreismanR.RegulationoftranscriptionbyMAPkinasecascades.Currentopinionincellbiology.1996;8(2):205-15.31. AndruskaN,ZhengX,YangX,HelferichWG,ShapiroDJ.AnticipatoryEstrogenActivationoftheUnfoldedProteinResponseisLinkedtoCellProliferationandPoorSurvivalinEstrogenReceptorαPositiveBreastCancer.Oncogene.2015;34(29):3760.
26
32. ShapiroDJ,LivezeyM,YuL,ZhengX,AndruskaN.AnticipatoryUPRactivation:Aprotectivepathwayandtargetincancer.TrendsinEndocrinology&Metabolism.2016;27(10):731-41.33. SkaletAH,IslerJA,KingLB,HardingHP,RonD,MonroeJG.RapidBcellreceptor-inducedunfoldedproteinresponseinnonsecretoryBcellscorrelateswithpro-versusantiapoptoticcellfate.JournalofBiologicalChemistry.2005;280(48):39762-71.34. HeY,SunS,ShaH,LiuZ,YangL,XueZ,etal.EmergingrolesforXBP1,asUPeRtranscriptionfactor.Geneexpression.2010;15(1):13-25.35. PiperiC,AdamopoulosC,PapavassiliouAG.XBP1:apivotaltranscriptionalregulatorofglucoseandlipidmetabolism.TrendsinEndocrinology&Metabolism.2016;27(3):119-22.36. HuE,MuellerE,OlivieroS,PapaioannouV,JohnsonR,SpiegelmanB.Targeteddisruptionofthec-fosgenedemonstratesc-fos-dependentand-independentpathwaysforgeneexpressionstimulatedbygrowthfactorsoroncogenes.TheEMBOjournal.1994;13(13):3094.37. FeiJ,ViedtC,SotoU,ElsingC,JahnL,KreuzerJ.Endothelin-1andSmoothMuscleCells.Arteriosclerosis,thrombosis,andvascularbiology.2000;20(5):1244-9.38. LizioM,HarshbargerJ,ShimojiH,SeverinJ,KasukawaT,SahinS,etal.GatewaystotheFANTOM5promoterlevelmammalianexpressionatlas.Genomebiology.2015;16(1):22.39. AitkenS,AkmanOE.Nestedsamplingforparameterinferenceinsystemsbiology:applicationtoanexemplarcircadianmodel.BMCsystemsbiology.2013;7(1):72.40. MathelierA,ZhaoX,ZhangAW,ParcyF,Worsley-HuntR,ArenillasDJ,etal.JASPAR2014:anextensivelyexpandedandupdatedopen-accessdatabaseoftranscriptionfactorbindingprofiles.Nucleicacidsresearch.2013;42(D1):D142-D7.41. GrantCE,BaileyTL,NobleWS.FIMO:scanningforoccurrencesofagivenmotif.Bioinformatics.2011;27(7):1017-8.
27
Figure 1. Time course datasets demonstrating the immediate earlyresponse.(a)Schematicoftheeighttimecoursedatasetsconsidered,horizontallines indicate the time span and symbols show the sampling times. Time zerocorresponds to inactivated or quiescent cells in all cases. (b) The time courseexpressionprofileofFOS(left)andJUN(right)inalleightdatasets.Cageclusterexpression(meanTPMofthreereplicates)isplottedagainsttime.(c)TheextenttowhichtheclassificationofaTSSasapeakisuniquetoonedataset(3515TSS)orsharedbetween2ormoredatasets.
ImmediateEarlyResponse
0
1000
2000
3000
0 200 400 600Time (minutes)
mea
n ex
pres
sion
(TPM
)
PAC_FGF2PAC_IL1BPEC_VEGFPMDM_LPSMCF7_EGF1MCF7_HRGPMSC_MIXSAOS2_OST
FOS time course expression
40
80
120
0 200 400 600Time (minutes)
mea
n ex
pres
sion
(TPM
)
JUN time course expression
(a) (c)
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
PAC_FGF2
PAC_IL1B
PEC_VEGF
PMDM_LPS
MCF7_EGF1
MCF7_HRG
PMSC_MIX
SAOS2_OST
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 630 660 690 720time of sampling (minutes)
ds$dataset●
●
●
●
●
●
●
●
PAC_FGF2PAC_IL1BPEC_VEGFPMDM_LPSMCF7_EGF1MCF7_HRGPMSC_MIXSAOS2_OST
Pie Chart of peaking genes
3515 in 1 datasets: 40%
2388 in 2 datasets:27.2%
1578 in 3 datasets: 17.9%
797 in 4 datasets: 9.1%
325 in 5 datasets: 3.7%140 in 6 datasets : 1.6%:
37 in 7 datasets: 0.4%
5 in 8 datasets: 0.1%
(b)
28
Figure 2. Broad trends in peak expression times across datasets. (a)Identification of the peak time parameter (tp) of FOS estimated from thePMDM_LPS time series (filled symbols indicate the median TPM; unfilledsymbols are individual replicates; green lines represent tp and one standarddeviation above and below). (b)Heatmap of the times of peakTSS expression(tp)forTSSsinthepermissivesetforalldatasets.HeatmapcoloursreflectthetpforeachCAGETSS(within100min:darkgreen;100-150min:lightgreen;150-200min:yellow;beyond200min:red).KnownIEGsareindicatedontheleftbyblackcells.
IEGsothers
(b)
(a)
Expr
essi
on (T
PM)
29
Figure 3. Promoter usage across time series datasets. For representativegenes,barchartsshowthenumberofdatasetswhereeachTSSpeakstoillustratethe diversity of TSS usage and commonality of the peaking response. KnownIEGSareshowninblue,transcriptionfactorsinyellowandothergenesingreen.FOSBhasasingleTSSthatpeaksin8datasets,JUNhasthreeTSSeachpeakingin4ormoredatasets,andXBP1has6TSSthatpeakinbetween1and6datasets.
0
2
4
6
8
chr19:45971246..45971265,+
FOSB
0
2
4
6
8
chr5:172198190..172198206,−
DUSP1
0
2
4
6
8
chr19:49375669..49375684,+
PPP1R15A
0
2
4
6
8
chr11:64645991..64646005,−
chr11:64646086..64646101,−
EHD1
0
2
4
6
8
chr14:75745523..75745537,+
chr14:75746722..75746777,+
chr14:75747296..75747329,+
FOS
0
2
4
6
8
chr1:59249688..59249703,−
chr1:59249707..59249727,−
chr1:59249732..59249749,−
JUN
0
2
4
6
8
chr12:57482665..57482699,+
chr12:57482882..57482907,+
chr12:57482918..57482935,+
chr12:57482936..57482965,+
NAB2
0
2
4
6
8
chr6:134495992..134496010,−
chr6:134496823..134496840,−
chr6:134496842..134496884,−
chr6:134498969..134499006,−
chr6:134499037..134499048,−
chr6:134551475..134551489,−
SGK1
0
2
4
6
8
chr5:115177247..115177279,−
chr5:115177496..115177558,−
ATG12
0
2
4
6
8
chr2:120980665..120980750,−
chr2:120980939..120980983,−
TMEM185B
0
2
4
6
8
chr9:33167149..33167170,−
chr9:33167178..33167205,−
chr9:33167308..33167378,−
B4GALT1
0
2
4
6
8
chr17:49242884..49242925,+
chr17:49243759..49243787,+
chr17:49243792..49243828,+
NME2
0
2
4
6
8
chr3:57583064..57583079,−
chr3:57583085..57583121,−
chr3:57583130..57583152,−
chr3:57583171..57583182,−
ARF4
0
2
4
6
8
chr5:138609782..138609826,+
chr5:138629349..138629386,+
chr5:138629389..138629416,+
chr5:138629417..138629446,+
chr5:138629628..138629688,+
MATR3
0
2
4
6
8
chr22:29196030..29196075,−
chr22:29196433..29196445,−
chr22:29196457..29196462,−
chr22:29196492..29196503,−
chr22:29196511..29196541,−
chr22:29196547..29196563,−
XBP1
0
2
4
6
8
chr20:57464200..57464214,+
chr20:57464223..57464241,+
chr20:57466086..57466166,+
chr20:57466401..57466412,+
chr20:57466435..57466456,+
chr20:57466552..57466565,+
chr20:57466569..57466597,+
chr20:57466599..57466609,+
chr20:57466629..57466640,+
chr20:57466679..57466711,+
chr20:57466759..57466785,+
chr20:57466793..57466807,+
chr20:57467204..57467240,+
chr20:57478731..57478779,+
chr20:57484578..57484609,+
chr20:57484789..57484811,+
GNAS
count
count
30
Figure4.Conservedactivationnetwork.(a)Schematicprofilesoftwopeakinggenes, with temporal precedence indicated by the arrow. (b) Conservedtemporal precedence between IEGs (light blue nodes), transcription factors(yellow nodes) non-coding RNA (grey nodes) and other protein-coding genes(greennodes) isshownbydirectededges.AsubsetofIEGsinthisnetworkarealsotranscriptionfactors(FOS,KLF6,FOSB,BHLHE40,JUN,andFOSL1).
0 50 100 200 300
010
2030
4050
Peak
Time (min)
Expr
essio
n
ts1 ts2
ARF4 ATG12
B4GALT1BHLHE40
CUL3
DKC1 DUSP1
EHD1
FLNA
FOS
FOSB
FOSL1GNASGNB2L1
HNRNPL
ITM2B
JUN
KLF6
MATR3
MIR3654 NAB2
NME2
OCIAD1
PFKFB3PHLDA1
PLEC
PTGES3
PTP4A1RPSA
SDC4
SEC11A
SGK1
SLC3A2
SMARCA5
THBS1
TMBIM6
TMEM185B
UBE2D3
XBP1
AC005332.1
AC058791.2
AL928646.1
DLEU2
LINC00263
LINC00476PLK1S1
RP11−323F5.2
SEC22B
SNORD65
LINC00478
MIR21
SCARNA17
RP11−492E3.1PPP1R15ASRSF11
SNORD82AC058791.1
B
A(a)
(b)
31
Figure5.Transcriptionaldynamicsof genes classified to thepeakmodel.Scatterplots of log fold change against the time of peaking for selected genes,withconservedtemporalprecedenceindicatedbyarrowsforPMDM_LPS(a)andMCF7_EGF1(b).FOSpeaksearliestandhasmanyconservedtemporalrelationstolaterpeakinggenes,whileEHD1peakslateandhasmanyconservedtemporalorderingswithearlierpeakinggenes.
●
●
●
●
●
●
●
●
●
●
ATG12
EHD1
FOS
FOSB
GNAS
JUN
MIR3654
XBP1
DLEU2
LINC00476
10
1000
50 100 150 200 250Time of peaking (minutes)
Fold
cha
nge
expr
essi
on
biotypeaaaa
IEG
ncRNA
other
TF
PMDM_LPS
●●
●
●
●
●
●
●
●
●
ATG12 EHD1
FOS
FOSB
GNAS
JUN
MIR3654
XBP1
DLEU2LINC00476
10
1000
50 100 150 200Time of peaking (minutes)
Fold
cha
nge
expr
essi
on
biotypeaaaa
IEG
ncRNA
other
TF
MCF7_EGF1
IEGncRNATFother
IEGncRNATFother
(a) (b)