32
Edinburgh Research Explorer Conserved temporal ordering of promoter activation implicates common mechanisms governing the immediate early response across cell types and stimuli Citation for published version: Vacca, A, Itoh, M, Kawaji, H, Arner, E, Lassmann, T, Daub, C, Carninci, P, Forrest, ARR, Hayashizaki, Y, FANTOM consortium, T, Aitken, J & Semple, C 2018, 'Conserved temporalorderingof promoter activation implicates common mechanisms governing the immediate early response across cell types andstimuli', Open Biology. https://doi.org/10.1098/rsob.180011 Digital Object Identifier (DOI): 10.1098/rsob.180011 Link: Link to publication record in Edinburgh Research Explorer Document Version: Peer reviewed version Published In: Open Biology General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 01. Jan. 2021

Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

Edinburgh Research Explorer

Conserved temporalorderingof promoter activation implicatescommon mechanisms governing the immediate early responseacross cell types andstimuli

Citation for published version:Vacca, A, Itoh, M, Kawaji, H, Arner, E, Lassmann, T, Daub, C, Carninci, P, Forrest, ARR, Hayashizaki, Y,FANTOM consortium, T, Aitken, J & Semple, C 2018, 'Conserved temporalorderingof promoter activationimplicates common mechanisms governing the immediate early response across cell types andstimuli',Open Biology. https://doi.org/10.1098/rsob.180011

Digital Object Identifier (DOI):10.1098/rsob.180011

Link:Link to publication record in Edinburgh Research Explorer

Document Version:Peer reviewed version

Published In:Open Biology

General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and / or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights.

Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation. If you believe that the public display of this file breaches copyright pleasecontact [email protected] providing details, and we will remove access to the work immediately andinvestigate your claim.

Download date: 01. Jan. 2021

Page 2: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

1

Title: Conserved temporal ordering of promoter activation

implicatescommonmechanismsgoverningtheimmediateearly

responseacrosscelltypesandstimuli

Shorttitle:Promoteractivationintheimmediateearlyresponse

Annalaura Vacca1, Masayoshi Itoh2, Hideya Kawaji3, Erik Arner4, Timo

Lassmann5, CarstenO. Daub6, Piero Carninci4, Alistair R. R. Forrest7, Yoshihide

Hayashizaki2,theFANTOMConsortium^,StuartAitken1,Colin A. Semple1*

1MRCHumanGeneticsUnit,MRC InstituteofGeneticsandMolecularMedicine,

UniversityofEdinburgh,CreweRoad,Edinburgh,EH42XU,UK

2RIKEN Preventive Medicine and Diagnosis Innovation Program, 2F Main

ResearchBuilding,2-1Hirosawa,Wako,Japan

3RIKENAdvancedCenterforComputingandCommunication,RIKENYokohama

Campus,Yokohama,230-0045,Japan

4RIKEN Center for Life Sciences Technologies, RIKEN Yokohama Campus,

Yokohama,230-0045,Japan

5Telethon Kids Institute, The University of Western Australia, Roberts Road,

Subiaco,WesternAustralia,Australia

6Department of Biosciences and Nutrition, Karolinska Institutet, 141 86,

Stockholm,Sweden

7Harry Perkins Institute of Medical Research, 6 Verdun St, Nedlands,Western

Australia6009,Australia

Page 3: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

2

*Corresponding Author

Email:[email protected]

^See Acknowledgements for further information on the FANTOM Consortium.

Page 4: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

3

Abstract

Thepromotersofimmediateearlygenes(IEGs)arerapidlyactivatedinresponse

to an external stimulus. These genes, also known as primary response genes,

havebeenidentifiedinarangeofcelltypes,underdiverseextracellularsignals,

andusingvaryingexperimentalprotocols.Whereasgenomicdissectiononacase

bycasebasishasnotresultedinacomprehensivecatalogueofimmediateearly

genes, a rigorous meta-analysis of eight genome-wide FANTOM5 CAGE (Cap

AnalysisofGeneExpression)timecoursedatasetsrevealssuccessivewavesof

promoter activation in IEGs, recapitulating known relationships between cell

types and stimuli: We obtain a set of 57 (42 protein-coding) candidate IEGs

possessing promoters that consistently drive a rapid but transient increase in

expressionovertime.ThesegenesshowsignificantenrichmentforknownIEGs

reported previously, pathways associated with the immediate early response

and include a number of non-coding RNAs with roles in proliferation and

differentiation.Surprisingly,wealsofindstrongconservationoftheorderingof

activationforthesegenes,suchthat77pair-wisepromoteractivationorderings

are conserved. Using the leverage of comprehensive CAGE time series data

acrosscelltypeswealsodocumenttheextensivealternativepromoterusageby

suchgenes,which is likely tohavebeenabarrier to theirdiscoveryuntilnow.

The common activation ordering of the core set of early-responding geneswe

identifymayindicateconservedunderlyingregulatorymechanisms.Incontrast,

theconsiderablylargernumberoftransientlyactivatedgenesthatarespecificto

eachcelltypeandstimulusillustratesthebreadthoftheprimaryresponse.

Page 5: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

4

Introduction

Human cells respond to a broad range of extracellular stimuli with a

characteristic burst of transcription within minutes at many sites across the

genome, known as the immediate-early response (IER). The IER has been

observed as an initiating event in many cellular processes, notably during

differentiation,inresponsestocellularstress,andininflammation.Theearliest

events in the IER involve the activationof thepromoters of a particular set of

genes, known as immediate-early genes (IEGs). The promoters of IEGs are

activatedrapidly,andtheiractivation is transient innormalcells [1].However,

IEGs are often dysregulated in cancers where they can become continuously

activated, accordingly someof thebest-studied IEGsareknownoncogenes [2].

For example, theexpressionof theFOSproto-oncogenenormallypeakswithin

60minutesofastimulusandsubsidesafter90minutes[3], incontrastwithits

continuousoverexpressioninmanycancers.

Immediate-earlygenespossessunusuallyaccessiblepromotersthatallowrapid

transcriptionalactivation inresponsetoastimuluswithout therequirementof

denovoproteinsynthesis.VariousfeaturesarethoughttodiscriminateIEGssuch

astheshortertranscriptstheygenerateandenrichmentsofcertaintranscription

factor (TF) binding sites at their promoters [4]. However, current knowledge

aboutIEGsisderivedmainlyfromstudiesofindividualgenesorpathways,and

often considers a specific cell type and stimulus. Thismeans that comparison

acrossstudiescanbeconfoundedbyexperimentalandtechnicalvariation,anda

comprehensive catalogue of IEGs remains elusive. There is also controversy

abouttheregulatorymechanismsgoverningtheresponseofevenrelativelywell-

Page 6: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

5

studied IEGs [5]. Beyond the induction of protein-coding IEG promoters, the

featuresandunderlyingmechanismsof the IERareeven lesswellunderstood.

Some studies have implicated altered patterns of IEG splicing as playing

importantrolesintheIER[6],whileothershavesuggestedaprominentrolefor

lncRNA activation [7] and transcribed enhancers [8]. Approximately 20% of

knownIEGsaretranscriptionfactors, includingsomeofthebestcharacterized:

EGR1-EGR4,FOS,FOSB,FOSL1,JUN,JUNBandMYC.

TheFANTOM5CapAnalysisofGeneExpression(CAGE)dataofferanumberof

advantages for expression profiling since they are based upon single-molecule

sequencingtoavoidPCR,digestionandcloningbiases.Theyprovideuptosingle

basepairresolutionoftranscriptionstartsites(TSSs)andpromoterregions,and

provideasensitive,quantitativereadoutoftranscriptionaloutputaccountingfor

thealternativepromotersofeachgene.Theoutputofindividualpromotersisnot

confounded by splicing variation, andmany novel lowly expressed transcripts

including non-coding RNAs (ncRNAs) can be readily detected (FANTOM

Consortium, Nature, 2014; http://fantom.gsc.riken.jp/5/). CAGE data are thus

ideallysuitedtostudyingthestrongburstoftranscriptionatpromotersseenin

immediate-early responses. FANTOM5 data include eight CAGE time course

datasetsemployingunusuallydensesamplingat timepointswithin300minof

stimulation, for a variety of stimuli treating a variety of cell types. These

heterogeneous datasets, produced using a common experimental platform,

should be fertile ground for novel insights into the IER, but a comprehensive

meta-analysishasnotbeenperformeduntilnow.

Page 7: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

6

Manypreviousapproachestotimeseriesanalysisofexpressiondatahavebeen

based upon differential expression between successive time points, or have

clusteredgenesaccordingtothesimilarityoftheirexpressionprofilesovertime

[9]. Both of these approaches present problems for the analysis of CAGEdata.

Differentialexpressionbetweentimepointsprovidespoorsensitivity for lowly

expressed transcripts (possessing too few reads to generate significant

differences in expression), and presents serious difficulties when comparing

expressionprofilesfromdatasetswithsomewhatdifferentsamplingpointsover

time. Clustering approaches often rely upon arbitrary thresholds (e.g. based

uponclustersizeorsignificantenrichmentoffunctionalannotationterms)and,

bydefinition,willmisstranscriptsthatcannotbeassignedtoaclusterbutmay

neverthelessshowdynamicsofinterest.Hencewerefineapreviouslysuccessful

Bayesian model selection algorithm to classify promoter responses to pre-

definedmathematicalmodels[7].

Hereweperformextensivemeta-analysesofpromoteractivityinthehumanIER,

encompassingunusuallydiversecelltypesandstimuli,torigorouslyclassifyIEGs

andestimatethecoreIEGrepertoireactiveacrosscellularresponses.Weshow

thatcomputationalclassificationofthetemporalactivitypatternsofpromoters

provides a potent basis for meta-analyses across time courses, exposing the

combinedactivityofknownIEGsandcompellingnewIEGcandidatesintheIEG

corerepertoire.Wealsoshowthatthetimingofthepeakexpressionofacoreset

of transiently activated genes has a conserved order. This surprising outcome

indicatesapreviouslyunidentifiedregulatorymechanismthat issharedamong

celltypesandcommontodiversestimuli.

Page 8: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

7

Results

Weconsideredeightdenselysampled,andwellreplicated,FANTOM5CAGEtime

course datasets obtained following diverse stimuli: calcification in an

osteosarcomacelllineinresponsetoosteocalcin(SAOS2_OST),differentiationof

adipose-derivedprimarymesenchymalstemcellsinresponsetoadrugmixture

(3-isobutyl-1-methylxanthine, dexamethasone, and rosiglitazone) (PMSC_MIX),

differentiation of primary lymphatic endothelial cells in response to VEGF

(PEC_VEGF),MCF7breastcancercell lineresponsestoEGF1(MCF7_EGF1)and

to HRG (MCF7_HRG), primary aortic smooth muscle cells response to IL1b

(PAC_IL1B)andFGF2(PAC_FGF2),andprimarymonocyte-derivedmacrophage

cellsactivation inresponsetoLPS(PMDM_LPS).Thus,we includedavarietyof

primaryandcell linesamples, trackingresponses toarangeofstimuli:growth

factors, hormones, drugs, pro-inflammatory cytokines and bacterial endotoxin

(Figure 1a). These diverse data provided a potent resource to discover core

featuresoftheIERconservedacrosscelltypesandstimuli.AllTSSsforprotein-

coding transcripts were represented by conservatively selected CAGE read

clusters (at least 10 TPM) followingArner et al. (2015) [10]. As expected, the

responses of known immediate early genes often showed characteristic

expressionpeaksearly in the time seriesdatasets - as exemplifiedbyFOSand

JUN – though even for these well-established IEGs we observed substantial

variation in themagnitude, timing anddurationofpeaks across cell types and

stimuli(Figure1b).TheseobservationsillustratethechallengespresentedinIEG

detection, even when studying known IEGs using a uniform experimental

platform.

Page 9: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

8

OptimisingandrefiningtheapproachdevelopedbyAitkenetal(2015)[7](see

Methods), we defined four mathematical models representing archetypical

expressionprofilesofinterestovertime:peak,linear,dipanddecay(electronic

supplementarymaterial, Figure S1), and assessed the fit of eachmodel to the

expressionprofileofeachgeneusingnestedsamplingtocomputethemarginal

likelihood, log Z [7]. Where sufficient evidence exists (given the variation

betweenreplicates)thealgorithmreturnsaclassificationofaninputtranscript

toamodel,andalsocomputesrelevantparametersofthefittedmodels(e.g.time

and magnitude of peak expression). These parameter estimates provide a

reliable basis for comparisons across time series datasets, evenwith different

samplingdensities[7]astheyarenotrestrictedtosamplingtimesorexpression

valuesatthosetimes.

CAGEtimeseriesmeta-analysisrevealsacorecomplementoftransiently

activatedpromoters

Across the eight time series datasets we considered all CAGE clusters

corresponding to the TSSs of known Ensembl [11] transcripts, encompassing

between 10,513 (corresponding to 7,706 Ensembl genes) and 14,376 (8,951

genes)protein-codingCAGETSSs,dependingonthedataset,andbetween1,202

(692 genes) and 1,640 (858 genes) non-coding RNA CAGE TSSs (electronic

supplementary material, Table S1). Between 15% and 42% of protein-coding

CAGE TSSs, and between 15% and 33% of non-coding TSSs were confidently

classified to one of the four models, depending on the dataset (electronic

supplementary material, Figure S2, Table S2). The remainder could not be

rigorously classified toa singlemodel andwereomitted from furtheranalysis.

Page 10: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

9

Thepeakmodelhad thehighestnumberof assignments in all thedatasets for

both protein-coding and ncRNA genes; for example, of 12,132 total Ensembl

protein-codinggenestested,wefound8,785Ensemblgenes(72%)topeakinat

leastoneofthedatasets.Incontrast,fewgeneswereclassifiedtothepeakmodel

in multiple datasets, with only 42 of such genes shared across at least seven

datasets(Figure1c),underliningthehighvariabilityoftranscriptionalresponses

seenforthesamepromotersacrosstimeseries.These42genesconstitutedour

‘robust’ set of candidate IEGs (genes, TSSs and peak times listed in electronic

supplementarymaterial, File S1).We also defined a less stringent ‘permissive’

setof1,304candidatessharedacrossatleastfouroutofeightdatasets.

We then explored the overlap in peaking genes outside of the robust set

(electronicsupplementarymaterial,TableS3)andfoundthat,foreachdataset,at

least8%ofpeakinggenesaresharedwithanotherdataset(range8%-16%)and

up to 52% of peaking genes are shared (range 19%-52%). The intersections

betweensetsof3datasetsbecamesmallerconsequently.Notably,approximately

50% of peaking genes are shared between datasetswhere the cell type is the

same(MCF7andPAC).

Ourmodelfittingapproachprovidedparameterestimatesforallpromoters

assignedtothesamemodel,providingastraightforwardandintuitivebasisfor

meta-analysis.Forexample,comparisonofthepeaktimes(tp)(Figure2a)forall

promotersclassifiedaspeaksinatleastfourdatasets(thepermissiveset)

readilydemonstratedcommonpatternsacrossdatasets(Figure2b).Wavesof

promoteractivationwereevident,withcertainpromoters,particularlyknown

IEGs,activatedinthesameearlytimewindowinmultipledatasets.Hierarchical

clusteringofthedatasetsbasedonthesepeakclasspromoters(9%ofall

Page 11: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

10

promotersassayed)alsorecapitulatedknownrelationshipsbetweencelltypes

andstimuli(Figure2b).Thetwodatasetsderivedfromthesamebreastcancer

cellline(MCF7_EGF1andMCF7_HRG)andstimulatedwithdifferentligandsof

thesameErbBreceptorfamilyclusteredtogetherasmightbeexpected.We

observedsimilarbehaviourforthetwoprimaryaorticcellsamplesexposedtoa

growthfactororactivatedbyapro-inflammatorycytokine(PAC_FGF2and

PAC_IL1Brespectively).Thus,similaritiesinpromoteractivationdynamics

(reflectedintpparameterestimates)betweendatasetsmayreflectunderlying

commonalitiesintheirunderlyingbiology.

The extent of alternative promoter usage across the robust set of IEGs and

candidateIEGsisshowninFigure3(seealsoelectronicsupplementarymaterial,

Figure S3). Candidate IEGs show slightly greater variability in the TSSs they

activate across datasets compared with known IEGs, with a greater median

numberTSS found topeak(3.5comparedwith2 forknown IEGs). Inaddition,

known IEGs tend to possess TSSs that are successfully classified to the peak

modelacrossalargernumberofdatasets(meanproportionofdatasetsclassified

as peak per TSS for known IEGs in the robust set = 4; candidate IEG mean

proportion = 2.5). Thus, known IEGs tend to possess smaller numbers of

alternativeTSSsthatalsotendtoshowdiscerniblepeaks inthemajorityofthe

time series datasets. It is possible that these relatively stereotypical

transcriptional characteristics of known IEGs may, in some cases, have led to

theirstatusaswellestablishedIEGs.Similarly,theincreasedvariabilityseenfor

the TSSs of candidate IEGs could have led to a failure to detect their IEG-like

behaviourinformerstudies.

Page 12: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

11

We investigated the nature of our promoter classifications by testing the

enrichmentofknownIEGs(seeMethods)withineachclass,foreachdataset.The

peak class was enriched for known IEGs in all datasets (electronic

supplementary material, Figure S4, Table S4), but failed to reach statistical

significance in PMSC_MIX (OR = 1.3, p = 0.2). Peaking genes shared across

datasetsweregenerallyassociatedwithsignificantenrichmentsofknownIEGs

(Table1),withthepermissiveset(sharedacross4ormoredatasets)expectedto

contain higher numbers of false positives than the robust set (7 or more

datasets).GenespossessingTSSsassignedtothepeakclassshowedenrichments

forGeneOntology(GO)processesassociatedwith transcription,cellactivation,

cellproliferation,celldifferentiationandcancerrelatedtermssuchascelldeath

andapoptosis(FDR<0.05;Methods)[12,13].Thesetermswerealsoconsistent

withpreviousstudiesofIEGs[4]asgenesintherobustsetshowedenrichment

for285GOterms,over30%(88)ofwhichweresharedwiththelistof773GO

termsofallknownIEGs(electronicsupplementarymaterial,TableS5).

Table1.EnrichmentofknownIEGsforgenesclassifiedtothepeakmodelinmultipledatasets

Shareddatasets

IEGsenrichment #CAGETSSs(median)#

genes#IEGs

#CAGETSSs(across8datasets)

#IEGCAGETSSs(across8datasets)

OR p

1to8(allpeakinggenes)

8,785 204 102,496 913 - - 1

2to8 5,270 171 71,384 853 6.3 2.2e-16 13to8 2,882 128 45,360 751 5.9 2.2e-16 24to8 1,304 86 24,616 590 5.9 2.2e-16 25to8 507 56 11,528 433 7.4 2.2e-16 36to8 182 35 4,896 299 10.3 2.2e-16 37to8 42 13 1,376 124 12.6 2.2e-16 48 5 2 264 18 8.3 4.6e-11 5Enrichment (expressedasodds ratios)andpvalues forgenesclassifiedacrossdifferentnumbersoftimeseriesdatasets.

Page 13: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

12

Novelnon-codingRNAcandidatesintheimmediateearlyresponse

Wenextappliedour classification topromotersdriving theexpressionofnon-

coding transcripts and found peak promoters driving the expression of 20

ncRNA genes (across at least seven datasets), constituting the robust set of

ncRNA candidate IEGs. These included promoters associated with the cellular

splicingmachinery,suchassmallnuclearRNAmulti-genefamilies(U1,U2,and

U4), which are part of the spliceosome, and SCARNA17, a small nuclear RNA

which contributes to the post transcriptional modification of many snRNPs.

Kalam et al. have shown that macrophage infection with Mycobacterium

tuberculosisresultsinthesystematicperturbationinsplicingpatterns[14],and

our results suggest more general roles for alternative splicing in the IER.

However, multigene families, such as these small nuclear RNAs, present

particularchallengesforreliablesequencereadmapping.Althoughprobabilistic

approaches to mapping ambiguously mapped reads were developed in

FANTOM5[10]wehavechosentoconservativelyremovethesegenesfromthe

robustset, leavingagroupof15non-codinggeneswithamedianof5peaking

TSS(Table2;electronicsupplementarymaterial,FigureS5andS6).

ThreemiRNAs are present in the robust set (Table 2) including the oncogene

miR-21 which was previously reported to show IEG-like behaviour in the

PAC_FGF2, PAC_IL1B and MCF7_HRG time series [7]. Here we find similar

behaviour in the MCF7_EGF1, PEC_VEGF, PMSC_MIX and SAOS2_OST datasets.

This extends previous studies reporting that the miR-21 mature transcript is

upregulatedonEGFtreatmentinMCF10A[15]andHeLa[16]cells.miR-29Ahas

Page 14: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

13

been associated with the viability and proliferation of mesenchymal stem cell

andgastriccancercells[17,18]andDLEU2isaputativetumoursuppressorgene

thathoststwomiRNAs,miR-15AandmiR-16-1whichareknowntoinhibitcell

proliferationand the colony-formingabilityof tumour cell lines, and to induce

apoptosis [19-21]. Seven lncRNAs also appear in the robust set (Table 2) and

among them LINC00478 is particularly interesting, as it has already been

reportedtoshowIEG-likebehaviour[7],isimplicatedinbreastcancerandhosts

an intronic cluster ofmiRNAs comprising let-7c,miR-99a, andmiR-125b [22].

Although poorly characterised, LINC00263, LINC-PINT and LINC00963 are

thought tobe involved inbiological processesoften triggeredby IEGs, such as

cellmaturation,cellproliferationandtheexpressionofgrowthfactorreceptors

[23-26].

Page 15: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

14

Table2.Non-codinggenespeakinginatleast7outof8datasets.

GeneID NºSharedDatasets

Description(PuMedref.)

LINC00478(MIR99AHG)

7 Ithasaroleincellproliferationanddifferentiationanditisconsideredaregulatorofoncogenesinleukemia(PMID:25027842)

LINC00263 7 Regulationofoligodendrocytematuration(PMID:25575711)

LINC-PINT 8 Putativetumorsuppressor(PMID:24070194)LINC00963 7 Involvedintheprostatecancertransition

fromandrogen-dependenttoandrogen-independentandmetastasisviatheEGFRsignalingpathway(PMID:24691949)

LINC00476 8 UncharacterizedlincRNALINC00674 7 UncharacterizedlincRNASTX18-AS1 7 UncharacterizedlincRNADLEU2 7 Criticalhostgeneofthecellcycleinhibitory

microRNAsmiR-15aandmiR-16-1(PMID:19591824)

MiR-29A 7 TheexpressionofthemiR-29familyhasantifibroticeffectsinheart,kidney,andotherorgans.miR-29shavealsobeenshowntoinduceapoptosisandregulatecelldifferentiation(PMID:22214600)

MiR-3654 7 InvolvedinProstateCancerprogression(PMID:27297584)

MiR-21 7 Oncogenicpotential(PMID:18548003)AL928646 7 UncharacterizedncRNASCARNA17 7 scaRNAInvolvedinthematurationofother

RNAmolecules(PMID:12032087)SNORD65 7 BelongstotheSmallnucleolarRNAs,C/D

family.InvolvedinrRNAmodificationandalternativesplicing(PMID:26957605)

SNORD82 7 BelongstotheSmallnucleolarRNAs,C/Dfamily.InvolvedinrRNAmodificationandalternativesplicing(PMID:26957605)

Theshortdescriptionsofthemolecularfunctionarefromthegenecarddatabase[27].

KnownIEGpromotersshowconservedtemporalorderofactivationacross

datasets

Having established common patterns of peak gene induction at similar times

acrossdatasets(Figure2b),wehypothesisedthatIEGsmayalsobeinducedina

conservedorderovertime.Toourknowledgetheextentofconservedordering

Page 16: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

15

ingeneinductionisunstudiedingeneral,andintheIERitisofparticularinterest

for two main reasons. Firstly, the presence of conserved gene orderings, in

addition to common gene classifications, provides an additional test for

functional similarity between datasets. Secondly, strongly conserved ordering

may suggest the existence of conserved regulatorymechanisms governing the

inductionof thesegenes.Toanalyse the relativeorderof activationacross the

eightdatasetswecomparedthepeaktimeofeachgenetothatofallothersinthe

peakclass.Iftherelativetemporalorderoftwogeneswasconservedinatleast

sevenoftheeightdatasetstheorderingforthispairwasconsideredconserved

andrepresentedbyanedgeintheconservedactivationnetwork(Figure4).

We found 77 pairs of genes showing conserved ordering in their activation,

involving 40 of the 57 genes in the robust set. FOS was the first gene to be

activated (lacking a predecessor in the ordering) and SDC4, EHD1 and

TMEM185B were the last. The number of conserved temporal connections

observed overall is statistically significant (p < 5e-3) by comparison to the

distribution of expected connections given 1,000,000 permuted datasets

(Methods). This appears to reflect a conserved coordination in promoter

activationduring the IERand furthersupports thecandidacyof thenovel IEGs

detected.Manygenes in thisnetworkareknown toparticipate inwell-studied

pathways active in the IER such as the MAPK signalling pathway as we now

discuss.

KnownIEGsandcandidateIEGsparticipateincommonsignallingpathways

HavingshownthatthepeakmodeldescribedthebehaviourofknownIEGs,we

speculated that the other genes assigned to this model might include novel

Page 17: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

16

candidateIEGs.Ofthe42genesintherobustsetmorethantwothirds(29genes)

arenotknowntobeIEGsandcan,therefore,beconsideredtobecandidatenovel

IEGs(henceforthcandidate IEGs).Pathwayanalysis [28]recoversmanyknown

relationships among known IEGs as expected, centred on heavily studied IEGs

suchasFOSand JUN.However, thesameanalysissuggests thatmore thanhalf

(17)of candidate IEGsalsoparticipate incommonpathwayswithknown IEGs,

involving a densely inter-connected network of 83 significantly over-

representedpathways (electronicsupplementarymaterial,TableS6), including

signalling cascades known to mediate the IER, such as the Ca2+-dependent

pathwaysandthemitogen-activatedprotein(MAP)kinasenetwork[29,30].

Thedynamicsof theexpressionofpeak-classifiedgenescanbevisualizedbya

scatterplotoffoldchangeagainstpeaktime(electronicsupplementarymaterial,

Figure S7). These quantitative features along with the conserved temporal

orderings described above showFOS as the earliest peaking IEG, EHD1 as the

last, with an array of conserved orderings subsequent to, and prior to the

peaking of these genes respectively (selected genes plotted in Figure 5). The

TSSs of known IEGs CAGE tend to show the greatest fold changes (electronic

supplementary material, Figure S8A, Wilcoxon p < 2.2e-16), however, some

candidate IEGs promoters show notably similar timing (electronic

supplementarymaterial,FigureS8C,Wilcoxonp=0.89).Thetimeofpeakingis

significantly earlier for known IEGs relative to the other protein-coding

promoters in only three time series: PMDM_LPS, MCF7_EGF1 and PEC_VEGF.

Fold changes inpeakncRNApromoters tend tobe lower than forknown IEGs

(electronic supplementary material, Figure S8B, Wilcoxon p < 0.05) but they

Page 18: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

17

occurearlier thanknownIEGs(electronicsupplementarymaterial,FigureS8D,

Wilcoxonp<0.05foralldatasets).

AmongthecandidateIEGsintherobustset,XBP1isespeciallynoteworthy.This

gene encodes a transcription factor and is relatively short in length (6Kb

compared with the mean of 58Kb for all Ensembl protein-coding genes)

consistentwiththeIEGarchetype[1].XBP1isahighlyconservedcomponentof

the Unfolded Protein Response (UPR) signalling pathways, activated by

unconventionalsplicinguponEndoplasmicReticulum(ER)stressornonclassical

anticipatoryactivation[31-33],andregulatesadiversearrayofgenes involved

in ER homeostasis, adipogenesis, lipogenesis and cell survival [34, 35].

Interestingly,genes intherobustsetaresignificantlyenrichedfortheGOterm

GO:003497responsetoendoplasmicreticulumstress(FDR<0.05,alltestedgenes

asthebackground),andfourofthefivegenesintherobustsetsharingthisterm

peakinconservedorderacrossthedatasets.Furthermore,wefoundasignificant

enrichment(FDR<0.05)oftheXBP1bindingmotifinthepromoterregions(see

Methods)of therobustsetofgenes(electronicsupplementarymaterial,Figure

S9).

Discussion

Exploiting the precision of FANTOM5 CAGE times series data, we discover a

robust set of 42protein-coding genesdrivenbypromoters showing rapid and

transient activation in response to multiple stimuli. This set contains 13

previously known IEGs and 29 candidate IEGs, which are likely to be core

componentsoftheIER.ApplyingourapproachtotheCAGETSSsofncRNAswe

Page 19: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

18

also discovered a set of 15 ncRNAs peaking across at least seven datasets,

comprising miRNAs and lncRNAs, suggesting regulatory roles for particular

miRNAsandlncRNAsspeciesintheIER[7].

FOSexpressionhas longbeen considered to lead the IERafter cell stimulation

[36,37],ourresultson the IERconservedactivationnetworksupport this,but

alsosimilarlyconservedrelationshipsextendingtoanadditional39codingand

non-codinggenes.Furthermore,weobservedmanyknownandnovelIEGsinthis

networkknowntobeinvolvedinarangeofsignallingpathwaysactiveintheIER,

such as the MAPK and the EGF/EGFR signalling pathways. This suggests the

variable constellations of genes involved in the IER to any particular stimulus

maybeunderpinnedbyadeeper levelof conservation in the regulationof the

IERacrossstimuli.

One of themost interesting candidate IEGs, XBP1, can be rapidly activated by

alternative splicing minutes after cell stimulation with mitogenic hormones,

activating peptides such as LPS and cytokines [31-33]. This key event of the

induced unfolded protein response (UPR) pathway is a conserved eukaryotic

response to cellular stress, and is thought to cooperate in the regulation of

immediateearlygeneexpression[32].However,thedynamicsofXBP1promoter

induction in the context of the IER have not been studied previously.

InterestinglywefoundasignificantenrichmentforXBP1TFbindingsitesinthe

promoter regions of 11 genes in the IER conserved activated network. The

presenceofXBP1andXBP1-respondinggenesinthisnetworksuggeststhisgene

mayactasanimportantlinkbetweentheIERandtheUPRpathway.

Page 20: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

19

Materialsandmethods

Datasets

The eight datasets used (Figure 1) are themost densely sampled human time

series produced by the FANTOM5Project,with all time points represented by

threereplicates[38].Detailedinformationonthegenerationofthesedatasetsis

available from Arner et al. (2015) [10], including CAGE library preparation,

quality control, sequencing and qRT-PCR validation, as well as protocols for

CAGEreadclusteringandTSSdetection.AllCAGEclustersrepresentingTSSsof

protein-coding genes were conservatively thresholded to >10 TPM (tags per

million),while CAGE clusters corresponding toncRNAwere thresholded to>2

TPM, allowing for their generally lower expression levels. FANTOM5 data

downloads,browsersandgenomic toolsareavailable fromtheprojectwebsite

(http://fantom.gsc.riken.jp/5/).

Model-basedclassificationofTSSexpressionprofiles

Toclassify time-seriesdata foreachCAGEdefinedTSSwerefinedapreviously

published method [7] which fits different mathematical models (kinetic

signatures)to individualexpressionprofiles,assessingthebest fitusingnested

sampling [39] to compute the marginal likelihood, logZ. All time series were

normalizedsuchthatthemediumminimumandmaximumacrossthetimeseries

wassetto0and10,respectively.

The kinetic signatures considered are: linear, decay, dip and delayed peak

(electronic supplementary material, Figure S1). The peak kinetic signature

considered in the previous method was modified to allow a delay before

expressionstartstoincreaseinexponentialfashion(td).Parametertsisthetime

Page 21: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

20

durationoftheinitialincreaseinexpression,p1istheexpressionattime0,and

p2istheincreaseinexpressionatthetimeofpeaking,tp=td+ts.

δ =log 0.3

t*

(1)

y = p-; t ≤ t0

(2)

y = p- + p2 1 − e6 7879 ; t0 < t ≤ t0 + t*

(3)

y = p- + p2 1 − e6 7879 − p2 1 − e6 787987; ; t > t0 + t*

(4)

However, an alternative rateδ = =>? @.-

79was also used to model the slower

dynamicsoftranscriptspeakinglaterintime,andthebestfittingmodelselected

during the decision step. Normalising the data such that expression lies in the

range0-10allowedthepriorrangesofparameterstoberestrictedtoplausible

valuesthatappliedtoalltimeseries.Thefitofmodelstodatawasimprovedasa

result.ToaccountforanyimpactonthelogZcalculation,wegeneratedsynthetic

time series datasets using parameter values drawn at random from the prior

rangestogenerateonereplicate,andgeneratedtwootherreplicatesbyadding

andsubtracting(respectively)agivenamountofnoisetothefirst.Modelfitting

wasapplied to1000suchdatasetspermodel (using the samenoisevalues for

eachmodeloneachof the1000 iterations)andweobservedanadvantage for

eachofthemorecomplexmodelsincomparisonwiththelinearmodelthatwas

consistentovertherangeoflogZvaluesobtainedforthelinearmodel.Tooffset

Page 22: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

21

this effect for each complex model, the advantage (mean difference plus two

standarddeviationsobserved in syntheticdata)was subtracted from the logZ

valuescalculatedforCAGETSSdatawhenmakingthecategorisationdecision.

Transcriptionfactorbindingsiteidentification

Weassessedtheenrichmentoftranscriptionfactorbindingsite(TFBS)motifsin

the JASPARdatabase [40] (January2017 release) for all CAGETSSassigned to

genes in the robust set relative to those assigned to the 12,132 genes tested

across all the datasets. Motif matches (FDR ≤ 0.05) were sought in flanking

400bpwindowscentredonthemiddleofeachCAGETSSanalysed),usingFIMO

[41]fromtheMEMEpackage(version4.11.2patch2).Enrichmentofeachmotif

intherobustsetrelativetothetotalsetwasassessedwithFisher’sexacttests,

correctingformultipletesting(FDR≤0.05).

PathwayandGeneOntologyenrichment

Functional and pathway enrichments were assessed using GOrilla [13] and

InnateDB[28]respectively(FDR≤0.05),usingthetotal12,132genesanalysed

acrosstheeightdatasetsasthebackgroundset.

The listof234knownIEGs[10]wasassembled from20publishedhumanand

mousedatasets fromthe literature; it isexpectedtocontain fewfalsepositives

butdoes includeanumberof IEGsonlyreported incellsand/orresponsesnot

examined in this study. To compute the enrichment of known IEGs in each

datasetwecomparedtheproportionofpeakingCAGETSSsassignedtoIEGswith

the proportion of peaking CAGE TSSs assigned to candidate IEGs. For the

enrichmentofknownIEGsineachsetofsharedpeakinggeneswecomparedthe

proportionofpeakingCAGETSSsassigned to the IEGsshared ineachgroupof

sharedgeneswiththeproportionofpeakingCAGETSSsassignedtoIEGsinthe

Page 23: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

22

remaining tested genes. The odds ratio and the p-value was assigned using

Fisher’sexacttest.

Networkconservation

A total of 57 protein-coding and non-coding candidate IEGs (corresponding to

known Ensembl genes) were considered for construction of the conserved

activation network. For genes with multiple peaking CAGE TSS we chose the

earliest peaking CAGE TSS (smallest tp) in each dataset, then the relative

pairwiseorderofeachgenewascomputedwithrespecttoalltheothergenesin

therobustset.Forexample,ifindataset-1,gene-Apeaksbeforegene-B(tpgene-A<

tpgene-B), and this order is observed in 6 or more of the other 7 dataset, the

temporalprecedence isdefinedtobeconserved.Applyingthisproceduretoall

57 coding and non-coding genes of the robust set we discovered 40 genes

temporally connected by 77 conserved relative orderings (Figure 4). The

significance of the number of temporal connections observed was measured

relative tonulldistribution, constructedbypermuting tp forall theCAGETSSs

1,000,000times;withtheproportionofpermuteddatasetswithatleastasmany

conserved orderings as the observed taken as an empirically derived p-value.

The observed value (77)was observed or exceeded in 4,516 out of 1,000,000

permutations indicating that the number of temporal connections was

statisticallysignificant(p<5e-3).

Acknowledgments

WethankallmembersoftheFANTOMconsortiumforcontributingtogeneration

of samples and analysis of the FANTOM5 dataset and thank GeNAS for data

production.The coremembersof theFANTOM5phase2projectwere:Alistair

Page 24: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

23

R.R. Forrest, Albin Sandelin, Carsten O. Daub, ChristineWells, David A. Hume,

Erik Arner, Hideya Kawaji, Kim M Summers, Kristoffer Vitting-Seerup, Piero

Carninci,RobinAndersson,YoshihideHayashizaki. FinnDrabløs(Departmentof

Clinical and Molecular Medicine, Laboratory Center, Erling Skjalgsons gt. 1,

Faculty ofMedicine andHealth Sciences, NorwegianUniversity of Science and

Technology, N-7006 Trondheim, Norway) curated the list of IEGs within the

FANTOM5consortium.

AuthorsContributions

AVcarriedoutthecomputationalanalysisanddraftedthemanuscript;CASand

SAdesignedthestudyanddraftedthemanuscript;MI,HK,EA,TL,COD,PCand

ARRF generated and processed the CAGE data coordinated by YH. All authors

gavefinalapprovalforpublication.

Page 25: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

24

References

1. FowlerT,SenR,RoyAL.Regulationofprimaryresponsegenes.Molecularcell.2011;44(3):348-60.2. HealyS,KhanP,DavieJR.Immediateearlyresponsegenesandcelltransformation.Pharmacology&therapeutics.2013;137(1):64-77.3. GreenbergME,ZiffEB.Stimulationof3T3cellsinducestranscriptionofthec-fosproto-oncogene.Nature.1984;311:433-8.4. BahramiS,DrabløsF.Generegulationintheimmediate-earlyresponseprocess.AdvancesinBiologicalRegulation.2016.5. O'DonnellA,OdrowazZ,SharrocksAD.Immediate-earlygeneactivationbytheMAPKpathways:whatdoanddon'tweknow?:PortlandPressLimited;2012.6. FowlerT,SuhH,BuratowskiS,RoyAL.RegulationofprimaryresponsegenesinBcells.JournalofBiologicalChemistry.2013;288(21):14906-16.7. AitkenS,MagiS,AlhendiAM,ItohM,KawajiH,LassmannT,etal.Transcriptionaldynamicsrevealcriticalrolesfornon-codingRNAsintheimmediate-earlyresponse.PLoSComputBiol.2015;11(4):e1004217.8. AnderssonR,GebhardC,Miguel-EscaladaI,HoofI,BornholdtJ,BoydM,etal.Anatlasofactiveenhancersacrosshumancelltypesandtissues.Nature.2014;507(7493):455-61.9. Bar-JosephZ,GitterA,SimonI.Studyingandmodellingdynamicbiologicalprocessesusingtime-seriesgeneexpressiondata.NatureReviewsGenetics.2012;13(8):552-64.10. ArnerE,DaubCO,Vitting-SeerupK,AnderssonR,LiljeB,DrabløsF,etal.Transcribedenhancersleadwavesofcoordinatedtranscriptionintransitioningmammaliancells.Science.2015;347(6225):1010-4.11. HubbardT,BarkerD,BirneyE,CameronG,ChenY,ClarkL,etal.TheEnsemblgenomedatabaseproject.Nucleicacidsresearch.2002;30(1):38-41.12. SupekF,BošnjakM,ŠkuncaN,ŠmucT.REVIGOsummarizesandvisualizeslonglistsofgeneontologyterms.PloSone.2011;6(7):e21800.13. EdenE,NavonR,SteinfeldI,LipsonD,YakhiniZ.GOrilla:atoolfordiscoveryandvisualizationofenrichedGOtermsinrankedgenelists.BMCbioinformatics.2009;10(1):1.14. KalamH,FontanaMF,KumarD.AlternatesplicingoftranscriptsshapemacrophageresponsetoMycobacteriumtuberculosisinfection.PLoSpathogens.2017;13(3):e1006236.15. AvrahamR,Sas-ChenA,ManorO,SteinfeldI,ShalgiR,TarcicG,etal.EGFdecreasestheabundanceofmicroRNAsthatrestrainoncogenictranscriptionfactors.SciSignal.2010;3(124):ra43.16. LlorensF,HummelM,PantanoL,PastorX,VivancosA,CastilloE,etal.Microarrayanddeepsequencingcross-platformanalysisofthemirRNomeandisomiRvariationinresponsetoepidermalgrowthfactor.BMCgenomics.2013;14(1):371.17. LiuX,CaiJ,SunY,GongR,SunD,ZhongX,etal.MicroRNA-29ainhibitscellmigrationandinvasionviatargetingRoundabouthomolog1ingastriccancercells.Molecularmedicinereports.2015;12(3):3944-50.

Page 26: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

25

18. ZhangY,ZhouS.MicroRNA‑29ainhibitsmesenchymalstemcellviabilityandproliferationbytargetingRoundabout1.MolecularMedicineReports.2015;12(4):6178-84.19. CimminoA,CalinGA,FabbriM,IorioMV,FerracinM,ShimizuM,etal.miR-15andmiR-16induceapoptosisbytargetingBCL2.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica.2005;102(39):13944-9.20. LernerM,HaradaM,LovénJ,CastroJ,DavisZ,OscierD,etal.DLEU2,frequentlydeletedinmalignancy,functionsasacriticalhostgeneofthecellcycleinhibitorymicroRNAsmiR-15aandmiR-16-1.Experimentalcellresearch.2009;315(17):2941-52.21. GaoS-m,XingC-y,ChenC-q,LinS-s,DongP-h,YuF-j.miR-15aandmiR-16-1inhibittheproliferationofleukemiccellsbydown-regulatingWT1proteinlevel.JournalofExperimental&ClinicalCancerResearch.2011;30(1):1.22. Gökmen-PolarY,ZavodszkyM,ChenX,GuX,KodiraC,BadveS.AbstractP2-06-05:LINC00478:Anoveltumorsuppressorinbreastcancer.CancerResearch.2016;76(4Supplement):P2-06-5-P2--5.23. MillsJD,KavanaghT,KimWS,ChenBJ,WatersPD,HallidayGM,etal.Highexpressionoflonginterveningnon-codingRNAOLMALINCinthehumancorticalwhitematterisassociatedwithregulationofoligodendrocytematuration.Molecularbrain.2015;8(1):1.24. MüllerS,RaulefsS,BrunsP,Afonso-GrunzF,PlötnerA,ThermannR,etal.Next-generationsequencingrevealsnoveldifferentiallyregulatedmRNAs,lncRNAs,miRNAs,sdRNAsandapiRNAinpancreaticcancer.Molecularcancer.2015;14(1):1.25. Marín-BéjarO,MarcheseFP,AthieA,SánchezY,GonzálezJ,SeguraV,etal.PintlincRNAconnectsthep53pathwaywithepigeneticsilencingbythePolycombrepressivecomplex2.Genomebiology.2013;14(9):1.26. WangL,HanS,JinG,ZhouX,LiM,YingX,etal.Linc00963:Anovel,longnon-codingRNAinvolvedinthetransitionofprostatecancerfromandrogen-dependencetoandrogen-independence.Internationaljournalofoncology.2014;44(6):2041-9.27. StelzerG,RosenN,PlaschkesI,ZimmermanS,TwikM,FishilevichS,etal.TheGeneCardssuite:fromGenedataminingtodiseasegenomesequenceanalyses.CurrentProtocolsinBioinformatics.2016:1.30.1-1..3.28. BreuerK,ForoushaniAK,LairdMR,ChenC,SribnaiaA,LoR,etal.InnateDB:systemsbiologyofinnateimmunityandbeyond—recentupdatesandcontinuingcuration.Nucleicacidsresearch.2012:gks1147.29. SchrattG,WeinholdB,LundbergAS,SchuckS,BergerJ,SchwarzH,etal.Serumresponsefactorisrequiredforimmediate-earlygeneactivationyetisdispensableforproliferationofembryonicstemcells.Molecularandcellularbiology.2001;21(8):2933-43.30. TreismanR.RegulationoftranscriptionbyMAPkinasecascades.Currentopinionincellbiology.1996;8(2):205-15.31. AndruskaN,ZhengX,YangX,HelferichWG,ShapiroDJ.AnticipatoryEstrogenActivationoftheUnfoldedProteinResponseisLinkedtoCellProliferationandPoorSurvivalinEstrogenReceptorαPositiveBreastCancer.Oncogene.2015;34(29):3760.

Page 27: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

26

32. ShapiroDJ,LivezeyM,YuL,ZhengX,AndruskaN.AnticipatoryUPRactivation:Aprotectivepathwayandtargetincancer.TrendsinEndocrinology&Metabolism.2016;27(10):731-41.33. SkaletAH,IslerJA,KingLB,HardingHP,RonD,MonroeJG.RapidBcellreceptor-inducedunfoldedproteinresponseinnonsecretoryBcellscorrelateswithpro-versusantiapoptoticcellfate.JournalofBiologicalChemistry.2005;280(48):39762-71.34. HeY,SunS,ShaH,LiuZ,YangL,XueZ,etal.EmergingrolesforXBP1,asUPeRtranscriptionfactor.Geneexpression.2010;15(1):13-25.35. PiperiC,AdamopoulosC,PapavassiliouAG.XBP1:apivotaltranscriptionalregulatorofglucoseandlipidmetabolism.TrendsinEndocrinology&Metabolism.2016;27(3):119-22.36. HuE,MuellerE,OlivieroS,PapaioannouV,JohnsonR,SpiegelmanB.Targeteddisruptionofthec-fosgenedemonstratesc-fos-dependentand-independentpathwaysforgeneexpressionstimulatedbygrowthfactorsoroncogenes.TheEMBOjournal.1994;13(13):3094.37. FeiJ,ViedtC,SotoU,ElsingC,JahnL,KreuzerJ.Endothelin-1andSmoothMuscleCells.Arteriosclerosis,thrombosis,andvascularbiology.2000;20(5):1244-9.38. LizioM,HarshbargerJ,ShimojiH,SeverinJ,KasukawaT,SahinS,etal.GatewaystotheFANTOM5promoterlevelmammalianexpressionatlas.Genomebiology.2015;16(1):22.39. AitkenS,AkmanOE.Nestedsamplingforparameterinferenceinsystemsbiology:applicationtoanexemplarcircadianmodel.BMCsystemsbiology.2013;7(1):72.40. MathelierA,ZhaoX,ZhangAW,ParcyF,Worsley-HuntR,ArenillasDJ,etal.JASPAR2014:anextensivelyexpandedandupdatedopen-accessdatabaseoftranscriptionfactorbindingprofiles.Nucleicacidsresearch.2013;42(D1):D142-D7.41. GrantCE,BaileyTL,NobleWS.FIMO:scanningforoccurrencesofagivenmotif.Bioinformatics.2011;27(7):1017-8.

Page 28: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

27

Figure 1. Time course datasets demonstrating the immediate earlyresponse.(a)Schematicoftheeighttimecoursedatasetsconsidered,horizontallines indicate the time span and symbols show the sampling times. Time zerocorresponds to inactivated or quiescent cells in all cases. (b) The time courseexpressionprofileofFOS(left)andJUN(right)inalleightdatasets.Cageclusterexpression(meanTPMofthreereplicates)isplottedagainsttime.(c)TheextenttowhichtheclassificationofaTSSasapeakisuniquetoonedataset(3515TSS)orsharedbetween2ormoredatasets.

ImmediateEarlyResponse

0

1000

2000

3000

0 200 400 600Time (minutes)

mea

n ex

pres

sion

(TPM

)

PAC_FGF2PAC_IL1BPEC_VEGFPMDM_LPSMCF7_EGF1MCF7_HRGPMSC_MIXSAOS2_OST

FOS time course expression

40

80

120

0 200 400 600Time (minutes)

mea

n ex

pres

sion

(TPM

)

JUN time course expression

(a) (c)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

PAC_FGF2

PAC_IL1B

PEC_VEGF

PMDM_LPS

MCF7_EGF1

MCF7_HRG

PMSC_MIX

SAOS2_OST

0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 630 660 690 720time of sampling (minutes)

ds$dataset●

PAC_FGF2PAC_IL1BPEC_VEGFPMDM_LPSMCF7_EGF1MCF7_HRGPMSC_MIXSAOS2_OST

Pie Chart of peaking genes

3515 in 1 datasets: 40%

2388 in 2 datasets:27.2%

1578 in 3 datasets: 17.9%

797 in 4 datasets: 9.1%

325 in 5 datasets: 3.7%140 in 6 datasets : 1.6%:

37 in 7 datasets: 0.4%

5 in 8 datasets: 0.1%

(b)

Page 29: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

28

Figure 2. Broad trends in peak expression times across datasets. (a)Identification of the peak time parameter (tp) of FOS estimated from thePMDM_LPS time series (filled symbols indicate the median TPM; unfilledsymbols are individual replicates; green lines represent tp and one standarddeviation above and below). (b)Heatmap of the times of peakTSS expression(tp)forTSSsinthepermissivesetforalldatasets.HeatmapcoloursreflectthetpforeachCAGETSS(within100min:darkgreen;100-150min:lightgreen;150-200min:yellow;beyond200min:red).KnownIEGsareindicatedontheleftbyblackcells.

IEGsothers

(b)

(a)

Expr

essi

on (T

PM)

Page 30: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

29

Figure 3. Promoter usage across time series datasets. For representativegenes,barchartsshowthenumberofdatasetswhereeachTSSpeakstoillustratethe diversity of TSS usage and commonality of the peaking response. KnownIEGSareshowninblue,transcriptionfactorsinyellowandothergenesingreen.FOSBhasasingleTSSthatpeaksin8datasets,JUNhasthreeTSSeachpeakingin4ormoredatasets,andXBP1has6TSSthatpeakinbetween1and6datasets.

0

2

4

6

8

chr19:45971246..45971265,+

FOSB

0

2

4

6

8

chr5:172198190..172198206,−

DUSP1

0

2

4

6

8

chr19:49375669..49375684,+

PPP1R15A

0

2

4

6

8

chr11:64645991..64646005,−

chr11:64646086..64646101,−

EHD1

0

2

4

6

8

chr14:75745523..75745537,+

chr14:75746722..75746777,+

chr14:75747296..75747329,+

FOS

0

2

4

6

8

chr1:59249688..59249703,−

chr1:59249707..59249727,−

chr1:59249732..59249749,−

JUN

0

2

4

6

8

chr12:57482665..57482699,+

chr12:57482882..57482907,+

chr12:57482918..57482935,+

chr12:57482936..57482965,+

NAB2

0

2

4

6

8

chr6:134495992..134496010,−

chr6:134496823..134496840,−

chr6:134496842..134496884,−

chr6:134498969..134499006,−

chr6:134499037..134499048,−

chr6:134551475..134551489,−

SGK1

0

2

4

6

8

chr5:115177247..115177279,−

chr5:115177496..115177558,−

ATG12

0

2

4

6

8

chr2:120980665..120980750,−

chr2:120980939..120980983,−

TMEM185B

0

2

4

6

8

chr9:33167149..33167170,−

chr9:33167178..33167205,−

chr9:33167308..33167378,−

B4GALT1

0

2

4

6

8

chr17:49242884..49242925,+

chr17:49243759..49243787,+

chr17:49243792..49243828,+

NME2

0

2

4

6

8

chr3:57583064..57583079,−

chr3:57583085..57583121,−

chr3:57583130..57583152,−

chr3:57583171..57583182,−

ARF4

0

2

4

6

8

chr5:138609782..138609826,+

chr5:138629349..138629386,+

chr5:138629389..138629416,+

chr5:138629417..138629446,+

chr5:138629628..138629688,+

MATR3

0

2

4

6

8

chr22:29196030..29196075,−

chr22:29196433..29196445,−

chr22:29196457..29196462,−

chr22:29196492..29196503,−

chr22:29196511..29196541,−

chr22:29196547..29196563,−

XBP1

0

2

4

6

8

chr20:57464200..57464214,+

chr20:57464223..57464241,+

chr20:57466086..57466166,+

chr20:57466401..57466412,+

chr20:57466435..57466456,+

chr20:57466552..57466565,+

chr20:57466569..57466597,+

chr20:57466599..57466609,+

chr20:57466629..57466640,+

chr20:57466679..57466711,+

chr20:57466759..57466785,+

chr20:57466793..57466807,+

chr20:57467204..57467240,+

chr20:57478731..57478779,+

chr20:57484578..57484609,+

chr20:57484789..57484811,+

GNAS

count

count

Page 31: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

30

Figure4.Conservedactivationnetwork.(a)Schematicprofilesoftwopeakinggenes, with temporal precedence indicated by the arrow. (b) Conservedtemporal precedence between IEGs (light blue nodes), transcription factors(yellow nodes) non-coding RNA (grey nodes) and other protein-coding genes(greennodes) isshownbydirectededges.AsubsetofIEGsinthisnetworkarealsotranscriptionfactors(FOS,KLF6,FOSB,BHLHE40,JUN,andFOSL1).

0 50 100 200 300

010

2030

4050

Peak

Time (min)

Expr

essio

n

ts1 ts2

ARF4 ATG12

B4GALT1BHLHE40

CUL3

DKC1 DUSP1

EHD1

FLNA

FOS

FOSB

FOSL1GNASGNB2L1

HNRNPL

ITM2B

JUN

KLF6

MATR3

MIR3654 NAB2

NME2

OCIAD1

PFKFB3PHLDA1

PLEC

PTGES3

PTP4A1RPSA

SDC4

SEC11A

SGK1

SLC3A2

SMARCA5

THBS1

TMBIM6

TMEM185B

UBE2D3

XBP1

AC005332.1

AC058791.2

AL928646.1

DLEU2

LINC00263

LINC00476PLK1S1

RP11−323F5.2

SEC22B

SNORD65

LINC00478

MIR21

SCARNA17

RP11−492E3.1PPP1R15ASRSF11

SNORD82AC058791.1

B

A(a)

(b)

Page 32: Edinburgh Research Explorer€¦ · 3RIKEN Advanced Center for Computing and Communication, RIKEN Yokohama Campus, Yokohama, 230-0045 ... 7Harry Perkins Institute of Medical Research,

31

Figure5.Transcriptionaldynamicsof genes classified to thepeakmodel.Scatterplots of log fold change against the time of peaking for selected genes,withconservedtemporalprecedenceindicatedbyarrowsforPMDM_LPS(a)andMCF7_EGF1(b).FOSpeaksearliestandhasmanyconservedtemporalrelationstolaterpeakinggenes,whileEHD1peakslateandhasmanyconservedtemporalorderingswithearlierpeakinggenes.

ATG12

EHD1

FOS

FOSB

GNAS

JUN

MIR3654

XBP1

DLEU2

LINC00476

10

1000

50 100 150 200 250Time of peaking (minutes)

Fold

cha

nge

expr

essi

on

biotypeaaaa

IEG

ncRNA

other

TF

PMDM_LPS

●●

ATG12 EHD1

FOS

FOSB

GNAS

JUN

MIR3654

XBP1

DLEU2LINC00476

10

1000

50 100 150 200Time of peaking (minutes)

Fold

cha

nge

expr

essi

on

biotypeaaaa

IEG

ncRNA

other

TF

MCF7_EGF1

IEGncRNATFother

IEGncRNATFother

(a) (b)