60
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2018 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1636 Ribosome standby sites and other structural aspects of translation initiation regions in Escherichia coli MAAIKE M. STERK ISSN 1651-6214 ISBN 978-91-513-0249-2 urn:nbn:se:uu:diva-343082

Ribosome standby sites and other structural aspects of …uu.diva-portal.org/smash/get/diva2:1185479/FULLTEXT01.pdf · I artikel II och III har jag analyserat hur så kallade standby-ställen

Embed Size (px)

Citation preview

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2018

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 1636

Ribosome standby sites and otherstructural aspects of translationinitiation regions in Escherichia coli

MAAIKE M. STERK

ISSN 1651-6214ISBN 978-91-513-0249-2urn:nbn:se:uu:diva-343082

Dissertation presented at Uppsala University to be publicly examined in Room A1:111a,Biomedicinskt centrum (BMC), Husargatan 3, Uppsala, Friday, 13 April 2018 at 09:00 forthe degree of Doctor of Philosophy. The examination will be conducted in English. Facultyexaminer: Professor Isabella Moll (University of Vienna).

AbstractSterk, M. M. 2018. Ribosome standby sites and other structural aspects of translationinitiation regions in Escherichia coli . Digital Comprehensive Summaries of UppsalaDissertations from the Faculty of Science and Technology 1636. 58 pp. Uppsala: ActaUniversitatis Upsaliensis. ISBN 978-91-513-0249-2.

Translation initiation, which is rate-limiting in protein synthesis, is often the step at whichregulation occurs. Here, we investigated several mechanisms of translation initiation inEscherichia coli, including their control. First, we showed that translation of the transcriptionalregulator CsgD is inhibited by two sRNAs through a direct antisense mechanism.

In some bacterial mRNAs, the ribosome binding site (RBS) is sequestered in a stable structure,which generally generates very low protein output. Yet, these mRNAs are often efficientlytranslated, which suggested the requirement for “ribosome standby sites”. Here, we investigatedthe structure and sequence features of an effective standby site using plasmid-borne GFPreporter constructs, and showed that relatively short, single-stranded regions near a structurallysequestered RBS can profoundly increase translation rates. Both the length and the sequenceof these single-stranded regions are important for standby site efficiency, and the standby siteneeds to be single-stranded. This work serves as a proof-of-principle study of the ribosomestandby model.

To investigate the sequence-dependency of standby sites further, we used an unbiasedapproach, creating plasmid libraries containing millions of different standby sites in the samereporter plasmid as before. Cells were sorted by fluorescence according to translational levels,and standby sites analyzed by deep sequencing. This analysis showed that efficient standby siteshave a low GC-content and rarely contain Shine-Dalgarno sequences. Additionally, nucleotidesnear the 3’-border of the standby region affect translation efficiency more than those closer tothe 5’-end. Mutational and structure-probing experiments are planned to verify these findings.

Keywords: Ribosome standby sites, translation initiation, mRNA structure, bacteria

Maaike M. Sterk, Department of Cell and Molecular Biology, Microbiology, Box 596,Uppsala University, SE-75124 Uppsala, Sweden.

© Maaike M. Sterk 2018

ISSN 1651-6214ISBN 978-91-513-0249-2urn:nbn:se:uu:diva-343082 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-343082)

For Owen and Julia

The most wonderful personsthat happened to me

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Holmqvist, E., Reimegård, J., Sterk, M., Grantcharova, N., Römling, U., Wagner, E.G.H. (2010) Two antisense RNAs tar-get the transcriptional regulator CsgD to inhibit curli synthesis. EMBO Journal, 29(11):1840-50

II Sterk, M., Romilly, C., Wagner, E.G.H. (2018) Unstructured 5'-tails act through ribosome standby to override inhibitory structure at ribosome binding sites. Nucleic Acids Research, Feb 6. [Epub ahead of print]

III Sterk, M.*, Kjellin, J.*, Söderbom, F., Wagner, E.G.H. (2018) Cell sorting and next-generation sequencing reveals sequence preferences of ribosome standby sites. Manuscript

* These authors contributed equally.

Reprints were made with permission from the respective publishers.

Contents

Svensk sammanfattning ................................................................................ 11

Nederlandse samenvatting ............................................................................ 15

Introduction ................................................................................................... 19 The cell and its protein production ........................................................... 19

Protein translation ................................................................................ 20 Translation initiation in bacteria .......................................................... 22

Translation efficiency ............................................................................... 25 Elements of the initiation region .......................................................... 26 Translation initiation efficiency – an interplay of factors .................... 31 mRNA structure and translational control ........................................... 34

Ribosome standby sites ............................................................................ 37 Structured mRNAs and a paradox ....................................................... 37 A solution: the ribosome standby model ............................................. 39 Support for the standby model ............................................................. 41 Standby binding – a universal feature of TIRs? .................................. 44

Current investigations ................................................................................... 45 Regulation of CsgD by sRNAs (Paper I) ................................................. 45 Ribosome standby sites (Paper II) ............................................................ 46 Sequence preferences of standby sites (Paper III) .................................... 47

Acknowledgements ....................................................................................... 49

References ..................................................................................................... 51

Abbreviations

30SIC 70SIC A site CAI CDS

30S initiation complex 70S initiation complex aminoacyl site codon adaptation index coding sequence

E site EM FACS IC IF lmRNA mRNA OB

P site PSK RBP RBS rRNA SD sequence SELEX

sRNA TA system tAI TIR tRNA UTR

exit site electron microscopy fluorescence-activated cell sorting initiation complex initiation factor leaderless mRNA messenger RNA oligonucleotide-/oligosaccharide-binding peptidyl site post-segregational killing RNA-binding protein ribosome binding site ribosomal RNA Shine-Dalgarno sequence systematic evolution of ligands by exponential enrichment small RNA toxin-antitoxin system tRNA adaptation index translation initiation region transfer RNA untranslated region

11

Svensk sammanfattning

DNA, RNA, och proteiner är huvudaktörer i de viktigaste processerna i en levande cell. DNA är arvsmaterialet som bär på de gener som framförallt kodar för proteiner. Från genernas DNA görs kopior som består av RNA – likt DNA består RNA av långa kedjor av byggstenar som vi kallar nukleotider. Denna kopiering kallas transkription (överskrivning). RNA innehåller koden till proteinerna. Proteiner är de viktigaste arbetshästarna i en cell; de bygger upp strukturer och fungerar ofta som enzymer. Informationsströmmen i varje cell går alltså från DNA via RNA till protein.

DNA är en lång dubbelspiral med fyra olika bokstäver – nukleotiderna vi kallar A, T, C, och G. Kedjorna hålls ihop genom att “komplementära” nukleotidbaser parar med varandra – A med T och C med G. RNA, den kortare DNA-kopian, är enkelsträngad, men dess baser (där T ersatts av den snarlika nukleotiden U) kan para med andra baser i samma RNA-sträng, vilket kan ge upphov till komplicerande strukturer som är viktiga för funktionen av RNA. En annan konsekvens är att två enkelsträngade RNA molekyler kan interagera med varandra.

Det finns olika slags RNA. Det så kallade mRNA (budbärar-RNA) innehåller de kodord som stavar till ett protein. Proteinerna tillverkas av ribosomer, stora molekylära maskiner som avläser koden i en mRNA-molekyl. Denna process kallas för translation. Ett mRNA innehåller även bokstavssignaler som talar om för ribosomen var avläsningen skall starta, och sluta. Andra RNA-molekyler, rRNA och tRNA, är komponenter i proteinsyntesapparaten. Ytterligare en klass av RNA-molekyler styr geners aktivitet. Vi kallar dem styr-RNA eller regulatoriska RNA.

Varje cell måste se till att rätt mängd av rätt protein blir producerat vid rätt tidpunkt. Då DNA är detsamma i varje kroppscell, behövs genreglering för att den ena cellen blir till en nervcell och den andra till en muskelcell. Likadant är det för bakterier. Här slås gener av och på beroende på tillväxtfasen, men också som svar på externa signaler som näringstillgång eller stress. Proteinproduktionen i varje cell styrs alltså noggrant för optimal tillväxt.

I tarmbakterien Escherichia coli, som jag studerade, bestäms syntesen av ett protein främst av frekvensen med vilken en ribosom binder till startsignalen på mRNA molekylen. Detta steg kallas initiation: ribosomen binder till startregionen, och sedan börjar translationen.

12

Initiatieringsfrekvensen kan påverkas av en rad faktorer. Om till exempel ett styr-protein eller styr-RNA binder till en region i mRNA-molekylen kan detta leda till att startsignalen blir gömd för ribosomen. Eller så kan tvärtom start-signalen bli lättare att binda till. I båda fallen pratar vi om translationsregulering. Min forskning handlar om lokala strukturer i mRNA-molekyler runt initiationsregionen, och hur de kan påverka proteinproduktion i E. coli bakterien.

I publikation I undersökte vi translation och reglering av proteinet CsgD, som har en viktig roll i bildning av biofilm – ett ihopbundet samhälle av bakterier. Translation av csgD mRNA kontrolleras av två korta styr-RNA (small RNA; sRNA). När dessa sRNA bas-parar med mRNA-molekylen förhindras translation och mindre CsgD protein tillverkas. Bindningen mellan csgD mRNA och dessa sRNA sker en bit uppströms av startsignalen av csgD. Trots att vi förstår interaktionen mellan mRNA-molekylen och dessa sRNA väl, finns det fortfarande frågetecken kvar som rör den exakta molekylära mekanismen.

Ibland bilder mRNA strukturer genom bas-parning. Om detta sker kan startsignalen bli “osynliga” för ribosomen – därför att ribosomen endast kan binda till enkelsträngade strartsignaler. Märkligt nog finns det dock exempel på fall där en stabil struktur vid startsignalen inte har en negativ effekt, och proteinet produceras effektivt. För drygt 15 år sedan föreslogs en förklaring för denna observation. Hypotesen kallas på engelska för ribosome standby model, och föreslår att en ribosom kan vänta (“on standby”) på ett ställe i mRNA-molekylen, i närheten av den strukturellt otillgängliga startsignalen. Just när startstrukturen öppnas en kort stund – alla RNA strukturer “andas” – och därmed blir enkelsträngad, kan ribosomen förflytta sig mot startsignalen för att producera proteinet. Enkelt uttryckt: en ribosom i närheten har en mycket större chans att nå den kortvarigt öppnade strukturen än en ribosom som befinner sig fritt i en cell. Standby modellen kan alltså förklara oförväntat effektiv translation.

I artikel II och III har jag analyserat hur så kallade standby-ställen i mRNA fungerar. Det gjorde jag mest genom kloning – att tillföra cellen en “extrakromosom” som kallas plasmid, som bär på de DNA-bitar vars funktion vi vill testa. Bakterierna läser av detta DNA och producerar mRNA, och sedan protein. I mitt fall är proteinet GFP (green fluorescent protein) vars syntes lätt kan kvantifieras genom dess fluorescens. Genom att jag introducerade olika DNA-bitar i cellen – mest sådana som jag antog skulle innehålla standby-sekvenser – framför en GFP-gen med en stabil struktur i början av mRNA-molekylen, kunde standby-effekten mätas. Att använda celler för dessa försök är både en styrka och en svaghet. Å ena sidan vet man att allt sker i den normala cellulära miljön, å andra sidan är det omöjligt att vara säker att det sker på translationsnivån. Därför använde jag även provrörsförsök där jag tillsatte artificiellt framställda mRNA-molekyler med

13

alla mina varianter av standby-ställen i början av molekylen, samt alla komponenter som behövs för proteinsyntes. Mina resultat visade att 12-18 nukleotider långa svansar av RNA framför en blockerande struktur ger ribosomen den hjälp som behövs för att börja translationen (arikel II).

Artikel II visar också att en svagare struktur behöver mindre hjälp av standby-funktionen. Effektiviteten av ett standby-ställe beror på både längden av svansen, men även i hög grad av vilka nukleotider som den består av. Till exempel så är svansar som består av CA-CA-CA etc., mycket mera effektiva än de som har AU-AU-AU etc. Mitt arbete bekräftar också att standby-ställen måste vara enkelsträngade; när vi låter en DNA-bit baspara med standby-stället hämmas translationen. Artikel II ger tydligt experimentellt stöd till den gamla ribosom standby-modellen.

Efter att lägga grunden i artikel II fortsatte vi med en mer förutsättningslös strategi. Min fråga var vilka av alla teoretiskt möjliga sekvenser som kan fungera som standby-ställen. Vi byggde ett så kallat bibliotek med miljontals olika slumpmässiga standby sekvenser. Alla var 12 bokstäver (nukleotider) långa, och hade klonats in i samma typ av plasmid som i artikel II. Plasmidbiblioteket – alla med olika 12-nukleotids DNA-bitar – flyttades in i E. coli-celler. Sedan sorterades bakterierna i ett instrument(flödescytometer) baserad på hur starkt fluorescerande de var. Tanken var atteffektiva standby-ställen ger hög fluorescens, svagare ger lägre. Från dessasorterade celler tog vi fram DNA som sedan sekvensbestämdes (artikel III).

Denna globala analys visade att effektiva standby-ställen består av mindre C och G, och mer A och T (som motsvarar U i RNA) nukleotider. Detta var förväntat, eftersom C och G bildar stabilare strukturer inom RNA än A och U. Vidare upptäckte vi att nukleotider som medför ökad stabilitet avstrukturen runt startsignalen ger lägre translationseffektivitet. I framtidaprojekt vill vi analysera förekomsten av olika kombinationer av sekvensersom finns i effektiva och ineffektiva standby-ställen, och även undersöka omvissa sekvensmotif spelar en roll.

14

15

Nederlandse samenvatting

De hoofdrolspelers in de belangrijkste processen in een levende cel zijn DNA, RNA en eiwitten. De eiwitten zijn de belangrijkste werkpaarden van een cel; alles wat in een cel gedaan moet worden, wordt door eiwitten gedaan. DNA is het erfelijk materiaal en bevat de genen. Van de genen in het DNA worden kopieën gemaakt die uit RNA bestaan – net zoals DNA bestaat RNA uit lange ketens van bouwstenen die we nucleotiden noemen. Het kopiëren noemen we transcriptie (overschrijving). Het RNA bevat dan de code die zal leiden tot eiwitproductie. De informatiestroom in elke cel gaat dus van DNA via RNA naar eiwit.

DNA is een lange dubbele keten gemaakt van vier verschillende letters – nucleotiden die we afgekort A, T, G, en C noemen. De ketens worden bijeen gehouden doordat “complementaire” nucleotiden met elkaar binden – dit proces heet paren; A met T en G met C. RNA, de kortere DNA-kopie, is een enkele keten, maar de nucleotiden (waarbij T vervangen is door de soortgelijke U) kunnen paren met andere nucleotiden in hetzelfde RNA, wat kan zorgen voor gecompliceerde structuren die belangrijk zijn voor de functie van het RNA. Een andere consequentie is dat twee enkele ketens RNA met elkaar kunnen paren.

Er bestaan verschillende soorten RNA: het zogenaamde mRNA (boodschapper RNA) bevat de code voor de productie (synthese) van een eiwit. De eiwitten worden gemaakt door ribosomen, grote moleculaire machines die de code in het mRNA aflezen. Dit proces heet translatie (vertaling). Een mRNA molecuul bevat tevens bepaalde signalen die het ribosoom vertellen waar de translatie moet beginnen (start-signaal) en eindigen. Andere RNA moleculen, rRNA en tRNA, zijn componenten in het eiwit-synthese-apparaat. Verder bestaat er nog een klasse RNA die de activiteit van genen aanstuurt. Die worden regulatie RNA (regulatory RNA) genoemd.

Het DNA in elke cel in het lichaam is hetzelfde, maar omdat niet elke cel dezelfde functie heeft, is er genregulatie (het aan- en uitzetten van genen) nodig, zodat elke cel de juiste hoeveelheid van het juiste eiwit op het juiste moment produceert. Hetzelfde geldt voor bacteriën. Daar worden genen aan- en uitgezet afhankelijk van de groeifase, maar ook als reactie op externe signalen, zoals de beschikbaarheid van voedingsstoffen of de aanwezigheid

16

van stressfactoren. De eiwitproductie in elke cel wordt daarom in grote mate gereguleerd voor optimale groei.

In de darmbacterie Escherichia coli (E. coli), die ik bestudeerd heb, wordt de synthese van een eiwit vooral bepaald door de frequentie waarmee een ribosoom aan het start-signaal op het mRNA bindt. Deze stap wordt initiatie genoemd: het ribosoom bindt aan het start-signaal, en daarna begint de translatie. De initiatiefrequentie kan beïnvloed worden door allerlei factoren. Een regulatie-eiwit of regulatie-RNA kan bijvoorbeeld binden aan dezelfde regio in het mRNA, zodat het startsignaal verborgen is voor het ribosoom. Of andersom, dat het startsignaal juist makkelijker te herkennen en te binden is. In beide gevallen spreken we van regulatie van de translatie. Mijn onderzoek gaat over lokale structuren in het mRNA rond de initiatie regio, en hoe die structuren de eiwitproductie kunnen beïnvloeden in de bacterie E. coli.

In artikel I analyseren we de translatie en regulatie van het eiwit CsgD, dat een belangrijke rol speelt in de vorming van biofilm – een samenleving van aan elkaar gebonden bacteriën. Translatie van het csgD mRNA wordt geregeld door twee korte stukjes regulatie-RNA (small RNA; sRNA). Wanneer deze sRNA’s met het mRNA base-paren, wordt de translatie verhinderd en wordt er minder CsgD eiwit geproduceerd. De binding tussen sRNA en mRNA gebeurt een stukje vóór het startsignaal van csgD. Ondanks dat we de interactie tussen het mRNA en de sRNA’s goed begrijpen, evenals het effect van die interactie, is het onduidelijk wat het exacte moleculaire mechanisme van deze regulatie is.

In sommige gevallen vormt het mRNA structuren door middel van base-paring. Als dat gebeurt kan het startsignaal “onzichtbaar” worden voor de ribosomen, want ribosomen moeten binden met enkelstrengs start-signalen. Verrassend genoeg zijn er gevallen waarbij een stabiele structuur rond een startsignaal geen negatief effect heeft, en het eiwit efficiënt geproduceerd wordt. Ruim 15 jaar geleden werd een verklaring voor dit verschijnsel voorgesteld. Die hypothese wordt in het Engels ribosome standby model genoemd en beschrijft dat een ribosoom kan wachten (on standby) op een plek op het mRNA, in de buurt van het ontoegankelijke startsignaal. Alle RNA structuren zijn constant in beweging en precies wanneer de structuur heel even opent en daarmee beschikbaar wordt als enkelstrengs RNA, kan het ribosoom zich verplaatsen naar het startsignaal om het eiwit te produceren. Eenvoudig gezegd, een ribosoom in de buurt heeft een veel grotere kans om het startsignaal te bereiken dan een ribosoom dat zich vrij in de cel bevindt. Het standby model kan dus de onverwachts efficiënte translatie verklaren.

In artikel II en III heb ik geanalyseerd hoe zogenaamde standby sites in mRNA fungeren. Dat is met name gedaan door middel van kloneren – men geeft de cel een “extra chromosoom” dat plasmide genoemd wordt, en dat de

17

stukken DNA bevat waarvan we de functie willen testen. De bacteriën lezen dat DNA, maken er RNA van, en daarna eiwit. In dit geval is dat het eiwit green fluorescent protein (GFP). De synthese van GFP kan makkelijk gekwantificeerd worden vanwege de fluorescentie van dit eiwit. Het standby effect kon gemeten worden door verschillende stukken DNA te maken –waarvan te verwachten was dat ze als standby sites zouden kunnen werken – en die in te voegen vóór het GFP gen met een stabiele structuur aan het begin. Het gebruik van cellen voor deze experimenten is zowel een kracht als een zwakte van dit onderzoek. De kracht is dat men weet dat alles gebeurt in de normale cellulaire omgeving, maar de zwakte is dat het onmogelijk is om zeker te zijn dat het effect van standby sites op translatieniveau plaatsvindt. Daarom heb ik ook reageerbuis experimenten gedaan, waarin gezuiverd mRNA met alle standby varianten toegevoegd werden, samen met alle componenten die nodig zijn voor eiwitsynthese. De resultaten in artikel II tonen aan dat 12-18 nucleotide lange staarten van RNA (de zogenaamde standby sites) vóór een geblokkeerde structuur de ribosomen de hulp biedt die ze nodig hebben voor de translatie.

Artikel II toont ook aan dat bij een zwakkere structuur rond het startsignaal minder hulp nodig is van de standby functie. De effectiviteit van een standby site hangt af van de lengte van de standby site, maar vooral van uit welke nucleotiden die bestaat. Zo zijn standby sites die bestaan uit CA-CA-CA etc. veel effectiever dan standby sites die bestaan uit AU-AU-AU etc. Mijn werk benadrukt ook dat standby sites enkelstrengs moeten zijn; als we een stukje DNA laten base-paren met de standby site, wordt translatie verhinderd. Artikel II geeft duidelijke experimentele ondersteuning voor het oude ribosoom standby model.

Na de basis gelegd te hebben in artikel II, gingen we verder met een meer exploratieve strategie. Hierbij was de vraag welke van alle theoretisch mogelijke nucleotide-volgordes (sequenties) kunnen fungeren als standby sites. We creëerden een zogenaamde bibliotheek van miljoenen verschillende willekeurige standby sites. Die waren allemaal 12 nucleotides lang, en waren gekloond in hetzelfde soort plasmide als in artikel II. De plasmide-bibliotheek werd in E. coli cellen gestopt. Vervolgens werden de bacteriën gesorteerd in een apparaat (flow cytometer) gebaseerd op hoe sterk fluorescerend ze waren. Het idee was dat effectieve standby sites hoge fluorescentie geven, en minder effectieve lagere fluorescentie. Het DNA werd uit de gesorteerde cellen gehaald en de sequenties werden bepaald.

Deze globale analyse in artikel III toonde aan dat effectieve standby sites bestaan uit minder G en C, en meer A en U nucleotiden. Dit resultaat was verwacht, omdat G en C stabielere structuren vormen binnen RNA dan A en U. Verder zagen we dat nucleotiden die zorgen voor een verhoogdestabiliteit van de structuur rond het startsignaal, een lagere translatie geven.Toekomstige experimenten zullen analyseren welke combinaties vannucleotiden aanwezig zijn in effectieve en ineffectieve standby sites.

18

19

Introduction

The cell and its protein production The central dogma in molecular biology describes the flow of genetic information in a cell – although the terms theory or hypothesis would be a more fitting description than dogma. Very simply, the central dogma states that DNA makes RNA and RNA makes protein. These processes are known as transcription and translation, respectively. It is an oversimplification of the processes occurring in a cell, though, as information is transferred in other ways as well. For example, DNA makes DNA is called DNA replication, RNA makes RNA in a process called RNA replication, and RNA makes DNA is known as reverse transcription. However, the simplified description of the central dogma points to an essential concept: that genetic information is stored in the form of DNA, and that this information is necessary for protein synthesis.

DNA and RNA can be seen as relatively passive components, simply storing and transferring genetic information, whereas proteins are the workhorses of the cell. This is an oversimplification as well, but it is true that proteins perform most cellular functions. Also, proteins are the most abundant macromolecules of a cell and a cell uses probably most of its ATP to produce proteins (Buttgereit and Brand, 1995; Russell and Cook, 1995). Proteins consist of a chain of 20 different amino acids that are encoded by the genetic code. They are folded into certain conformations that allow the proteins to perform their functions. Many proteins are enzymes that catalyze reactions. Other protein functions include a wide variety of, for example, structural or mechanical purposes (as in the cytoskeleton), cell signaling, or the immune response (for example antibodies).

Regulation of protein synthesis Nowadays, we know quite well how proteins are made from the genetic information stored in the DNA. However, the regulation of protein synthesis – how many proteins of which kind are produced at which moment and inwhich cell type – is not as well understood. Protein synthesis in a cell ishighly regulated, so as not to waste valuable resources. This regulationoccurs on different levels, which can be roughly divided into three stages: 1)regulation on the transcriptional level, determining how many messengerRNA (mRNA) molecules of which kind are produced at each moment; 2)

20

regulation on the translational level, meaning how many proteins are pro-produced from a given mRNA molecule; and 3) post-translational regulation, by protein degradation (when and how fast proteins are degraded) or modifications (which render proteins active or inactive). Regulation at the translational level has the important advantages over the other two that it is fast and reversible. Let us first look at translation in more detail.

Protein translation Proteins are made by protein synthesis machineries called ribosomes or ribosomal complexes. The ribosomes “read” the mRNA code, consisting of codons (combinations of three RNA nucleotides), and produce the corresponding chain of amino acids. The amino acids are delivered to the ribosome by transfer RNA (tRNA) species, which link an anticodon to its corresponding amino acid.

Translation in prokaryotes and eukaryotes There are some differences between ribosomes and translation mechanisms in eukaryotes (cells containing a cell nucleus) and prokaryotes (cells that do not contain a nucleus or other compartments), although the main principles of protein translation are universal. Firstly, transcription in eukaryotes takes place inside the nucleus, after which the mRNA is transported into the cytosol where translation takes place. Prokaryotes do not contain a nucleus, and transcription and translation both happen in the same place, the cytoplasm. Transcription and translation can take place on the same mRNA molecule, a process called transcription-translation coupling. Secondly, the ribosomal complexes and other factors needed for translation differ between prokaryotes and eukaryotes. Eukaryotic ribosomes are slightly bigger than prokaryotic ribosomes, and more protein factors are needed for translation in eukaryotes than in prokaryotes. Thirdly, the mRNA elements that dictate the translational start site differ between prokaryotes and eukaryotes, and also between different species.

Here, I will mainly focus on translation in bacteria, and more specifically on translation as it happens in Escherichia coli. However, some parallels will be drawn to translation in other bacteria and eukaryotes.

Components of the ribosome Ribosomes are composed of two subunits: the small and the large ribosomal subunits. In bacteria, the small and large subunits are called the 30S and 50S subunit, respectively, and together form the 70S ribosomal complex. In eukaryotes, the small subunit is called 40S and the large subunit 60S, and when associated with each other, they form the 80S ribosomal complex. The subunits consist of both ribosomal proteins and ribosomal RNA (rRNA) and are thus called ribonucleoproteins. Ribosomes from all three domains of life

21

are very similar; they differ slightly in size, structure, sequence, and pro-protein/rRNA composition. The small subunit of E. coli ribosomes consists of 16S rRNA and 21 proteins, among which protein S1. The 50S subunit consists of 5S and 23S rRNA and 34 ribosomal proteins. Ribosomes contain three sites for tRNA binding: the aminoacyl (A), the peptidyl (P) and the exit (E) site.

Steps of translation Translation is conceptually divided into four steps, each of which requires a different set of components. In short, translation starts with the formation of the initiation complex, followed by elongation of the amino acid sequence. Then, elongation is terminated and ribosomes are recycled for translation of the next mRNA.

Initiation Initiation is the formation of the initiation complex (IC), followed by translocation of the ribosome to start elongation. The initiation complex is the assembly of the ribosomal subunits, the initiator fMet-tRNAfMet, the initiation factors (IFs), and GTP as the energy source. The IC is formed at the start codon on the mRNA, by which the reading frame is set. Bacteria have three initiation factors – IF1, IF2 and IF3 – while eukaryotes contain many more. Initiation complex formation in E. coli will be discussed in more detail later.

In eukaryotes, initiation is usually cap-dependent: the small subunit binds to the 5’-end and scans the mRNA until it encounters the first start codon. Prokaryotic initiation, however, can occur at any start site present on the mRNA. Prokaryotic mRNAs are often polycistronic, which means that several proteins are encoded by the same mRNA, which is possible because initiation can start at different places.

Elongation During elongation the mRNA coding sequence (CDS) is read 5’ to 3’, while the chain of amino acids (or polypeptide chain) is synthesized. In short, after binding of fMet-tRNAfMet to the P site, a codon-matching charged tRNA in complex with EF-Tu/GTP is loaded into the A site. The amino acid is then transferred from the tRNA in the P site to that in the A site, thus forming a peptide bond. This peptide bond formation is catalyzed by the 23S rRNA, which is a ribozyme. In bacteria, the now uncharged tRNA in the P site and the tRNA in the A site move to the E site and the P site, respectively. This is called translocation. The mRNA moves along with the tRNAs and the next charged tRNA is selected by the next codon in the A site. Bacteria require only a few elongation factors, while eukaryotes again contain several more.

22

Termination and recycling When the ribosome encounters a stop codon, this codon is recognized by a release factor, and translation is terminated. The binding of a release factor to the ribosome triggers the hydrolysis that releases the polypeptide chain. Then, the ribosomal subunits and the release factors dissociate, which allows for ribosome recycling. In some cases, however, ribosomes are not released from the mRNA, but instead initiate at a start codon nearby in a process called reinitiation.

Ribosome assembly on the mRNA to form the IC occurs at the order of seconds (Kennell and Riezman, 1977; Mitarai et al., 2008). Elongation of the polypeptide chain is faster – up to 20 amino acids per second in E. coli (Sørensen and Pedersen, 1991). Although elongation rates can also affect translation rates (as will be discussed later), initiation is the rate-limiting step of protein synthesis (Ingolia et al., 2009; Kozak, 2005; Schlax and Worhunsky, 2003). Therefore, regulation of protein synthesis often occurs at the initiation step. Bacterial initiation efficiency and its regulation are the main subjects of this thesis.

Translation initiation in bacteria Textbook knowledge suggests that canonical translation initiation occurs through the assembly of 30S ribosomal subunits onto an initiation site of the mRNA, and that certain features are inherent of bacterial mRNAs. However, translation in E. coli and many other bacteria often occurs through other modes of initiation. The canonical and alternative initiation modes are discussed shortly.

The canonical mode of initiation The canonical mode of translation initiation in E. coli involves three intermediary initiation complexes: the 30S pre-initiation complex (pre-30SIC), the active 30S initiation complex (30SIC), and the active 70S initiation complex (70SIC). The pre-30SIC is formed between the mRNA, the 30S subunit containing the IFs, and sometimes fMet-tRNAfMet. This complex is transient, reversible, and not yet ready for translation. The active ternary 30SIC forms when the start codon of the mRNA is directed to the ribosome’s P site, and the initiator tRNA enters the P site with IF2. When the codon-anticodon interaction is formed between the start codon and the initiator tRNA, the complex undergoes a conformational change that makes the 30SIC more stable than the pre-30SIC, and the initiation factors IF1 and IF3 are ejected. Finally, the 50S subunit is joined and IF2 released after GTP hydrolysis, to form the 70SIC. The irreversible 70SIC is now ready to enter the elongation stage.

23

The initiation factors have different roles during translation initiation. In short, IF1 stimulates binding of the other IFs to the 30S subunit. IF1 cooperates with IF2, which specifically interacts with charged initiator tRNA, to promote its correct binding to the P site in the 30SIC. IF2 also stimulates 50S association to the 30SIC, and is only released after 70SIC formation. IF3 is mainly involved in the binding between 30S and 50S subunits. It binds strongly to 30S, and triggers dissociation of 70S into 30S and 50S during ribosome recycling. However, the roles of the IFs are not entirely clear – for example, IF1 and IF3 can be found on intact 70S ribosomes (Yamamoto et al., 2016).

An important aspect in start site selection is the Shine-Dalgarno (SD) sequence (Shine and Dalgarno, 1974). This sequence is specific to prokaryotes and is often compared with the Kozak consensus sequence in eukaryotes. The purine-rich SD sequence is located upstream of the start codon, and base-pairs with the antiSD sequence near the 3’-end of the 16S rRNA during 30SIC formation (Steitz and Jakes, 1975). Although the SD sequence helps proper start site selection in bacteria and is thought to be part of canonical mRNAs, its presence is not universal in E. coli and other Gram-negatives (see later). The SD sequence and its influence on translation efficiency will be discussed in detail below. First, we take a look at alternative modes of initiation.

Alternative modes of initiation Ribosomes can exhibit variations in their composition, both in rRNA and their ribosomal proteins [(Kaberdina et al., 2009; Vesper et al., 2011); reviewed in (Byrgazov et al., 2013)]. Several alternative modes of initiation have been suggested. For example, there is evidence for a 70S-scanning mode of initiation, in which a 70S ribosome does not dissociate, but scans the mRNA for the initiation site of a downstream cistron. The presence of an SD sequence is essential for 70S-scanning initiation (Yamamoto et al., 2016). Another pathway is the translation of leaderless mRNA (lmRNA), whose 5’-end is at, or close to, the start codon and thus lacks an SD sequence. lmRNAs are selectively initiated and translated by 70S ribosomes (Moll et al., 2004; Udagawa et al., 2004). Similarly, lmRNAs cannot be initiated by ribosomal subunits (Yamamoto et al., 2016). Initiation on lmRNAs is promoted by excess initiator tRNA (Udagawa et al., 2004). Finally, there is evidence for another 70S-related initiation mode, where initiation depends on an upstream AUG. This upstream sequence is not translated, however, but seems to serve as a recruitment signal for 70S ribosomes, after which an IC can be formed at a downstream start site (Beck and Janssen, 2017). Several additional examples of upstream AUG signals (Beck et al., 2016) suggest that this mode of initiation is more widespread.

An interesting case of selective translation involves the induction of a pool of specialized ribosomes by the chromosomal toxin-antitoxin (TA)

24

system mazEF (Vesper et al., 2011). The stress-induced toxin MazF is an endoribonuclease that cleaves single-stranded RNA at ACA-sequences. Under stress conditions, the stable MazF toxin cleaves some mRNAs at ACA sites closely upstream of their AUG start codon, thus creating leaderless mRNAs. Moreover, MazF also cleaves and removes part of the 3’-terminus of 16S rRNA, creating a population of ribosomes that lack the antiSD sequence – so-called stress-ribosomes. The lmRNAs are selectively translated by the stress-ribosomes (Vesper et al., 2011).

There are some fundamental differences between the different initiation modes. For example, 70S ribosomes cannot initiate at internal RBSs, which was shown by annealing an antisense oligo to the 5’-UTR of an mRNA, suppressing translation initiation. 30S initiation, on the other hand, can occur directly at internal initiation sites (Yamamoto et al., 2016). Also, the three initiation modes – standard 30S-binding, initiation of leaderless mRNAs, and 70S-scanning – display different requirements for IFs (Yamamoto et al., 2016), corroborating a natural variability in ribosomal composition in bacteria. Although such differences in initiation modes are interesting and important to fully understanding translation initiation in bacteria, they are less relevant for the work described in this thesis. Therefore, I will only refer to alternative initiation modes when needed.

Definitions of RBS and TIR In 1994, ribosomal footprinting on initiation complexes showed that the region of the mRNA that is protected from endonucleolytic cleavage in the presence of the 30S ribosome and the initiator tRNA spans from -35 to +19 relative to the translational start site (Hüttenhofer and Noller, 1994). Crystal structures showed that single-stranded mRNA enters the ribosome through a tunnel that accommodates about 30 nucleotides of mRNA (Yusupova et al., 2001), slightly shorter than the region protected by footprinting (Hüttenhofer and Noller, 1994).

The ribosome binding site (RBS) is defined roughly as the region that is protected from cleavage by the ribosome. Whatever happens with the mRNA in this region – structural propensity, binding of proteins or small RNAs (sRNAs) – affects translation initiation most strongly (e.g. Bouvier et al., 2008; Espah Borujeni et al., 2017). However, the translation initiation region (TIR) is the entire region of the mRNA that is involved in translation initiation. The exact length depends on the mRNA in question, and can be much longer than the RBS. The TIR could potentially extend far upstream of the start codon, but not as far downstream, since initiating ribosomes far downstream of the start codon would interfere with translating ribosomes (Adhin and van Duin, 1990). I aim to use the terms RBS and TIR to indicate the physical space that the 30S ribosome takes on the mRNA, and the total region of the mRNA that affects translation initiation, respectively.

25

Translation efficiency Translation efficiency is determined by both initiation and elongation rate. When both are high, protein synthesis is efficient. If the initiation rate is low, protein synthesis cannot be high, even if elongation would be fast. Conversely, if the initiation rate is high, but elongation proceeds slowly, ribosomes will run into each other causing ribosome jamming. This not only causes low protein output, but also a depletion of the free ribosome pool. So, although translation initiation is the major rate-limiting step (Ingolia et al., 2009; Kozak, 2005; Schlax and Worhunsky, 2003), elongation rates can also affect total translation efficiency.

Codon usage and translation rate As the genetic code is degenerate, amino acids can be encoded by multiple codons. Codon bias refers to the bias in the relative frequency of synonymous codons used in genes. The codon usage of a gene is said to be optimized when its codons reflect the available tRNA pool. A widely used metric is the codon adaptation index (CAI), which measures the deviation from a reference set of genes (Sharp and Li, 1987). Another measure is the tRNA adaptation index (tAI), which, inspired by the CAI, additionally takes into account the available pool of tRNAs (dos Reis et al., 2004). Besides general codon bias, genes also contain a codon pair bias, where certain codon pairs are favored or disfavored, and a codon co-occurrence bias, which takes into account the clustering of codons recognized by the same tRNA. (Quax et al., 2015) and other reviews contain more extensive information on codon bias.

Since the use of synonymous codons does not change the amino acid sequence, such mutations are called silent mutations, and were initially thought to not affect protein expression. However, the use of alternative codons can have effects. Although effects of codon usage on, for example, mRNA stability and termination have been observed, I will here only discuss effects on translation. Firstly, synonymous codons can affect the mRNA structural propensity in the coding region. Since this occurs mainly in the N-terminal region and primarily affects translation initiation (e.g. Boël et al., 2016; Goodman et al., 2013; Kudla et al., 2009), this will be discussed in more detail in the next section. Secondly, codon usage can affect the speed of elongation locally, causing variable translation rates across a gene. For example, highly expressed genes in pro- and eukaryotes contain a so-called codon ramp – a sequence of 30-50 relatively rare codons at the 5’-end of the coding sequence (Tuller et al., 2010) – that is correlated with a higher ribosome density (Ingolia et al., 2009). This codon ramp is thought to cause a lower elongation rate in the beginning of the gene, in order to prevent ribosome traffic jams further on (Tuller et al., 2010). Also, the occurrence of SD-like sequences in the coding region can cause translational pausing, as

26

they bind to the 16S rRNA and thus slow down the ribosome (Li et al., 2012). Similarly, the use of rare codons can locally slow down elongation, which is important for proper co-translational protein folding, especially between different protein domains, or for proper insertion of the protein into the membrane (e.g. Pechmann and Frydman, 2013; Zhang et al., 2009).

Finally, codon bias is thought to affect the overall speed of elongation, and thus translation rates. Several papers have shown a correlation between codon usage and translation rate (e.g. Boël et al., 2016; Sørensen and Pedersen, 1991). However, since many of those studies involved overexpression and non-endogenous conditions, and because overexpression of a protein can create imbalances in the available pool of charged tRNAs, the impact of codon adaptation on translation rates might have been overestimated. Indeed, ribosome-profiling – deep sequencing of ribosome-protected mRNA fragments – has not supported differences in elongation rate, irrespective of codon usage (Li et al., 2012, 2014; Quax et al., 2013). It seems that codon usage correlates only weakly with translation efficiency under physiological conditions and for endogenous genes (Burkhardt et al., 2017). Instead, it seems more likely that codon usage – especially in highly expressed genes – has been adapted to expression levels in order to optimize overall protein synthesis and to increase cellular fitness (Andersson and Kurland, 1990).

All in all, although codon bias can be a means to fine-tune gene expression (Quax et al., 2015), codon usage does not seem to be a major determinant for elongation speed (Burkhardt et al., 2017; Li et al., 2012, 2014). Rather, most studies show the predominant effect to be due to mRNA structure, especially near the start of genes (Burkhardt et al., 2017; Del Campo et al., 2015; Kudla et al., 2009; Quax et al., 2013), a subject that will be discussed extensively below.

Elements of the initiation region Several elements and features inherent to the mRNA – the start codon, SD sequence, secondary structure, and translational enhancer sequences – affect its initiation efficiency. In this section, these factors will be discussed separately.

One mRNA element, the downstream box, was originally proposed to enhance translation initiation by annealing to 16S rRNA (Sprengart et al., 1990). However, later evidence suggested that the downstream box does not base-pair with 16S rRNA, nor affect translation efficiency (La Teana et al., 2000; Moll et al., 2001). Thus, the contribution of this element to translation initiation remains controversial.

27

Start codons The standard start codon in E. coli is AUG, which is used in 83% of its genes. Initiation also often occurs from two near-cognate start codons: GUG, used in 14% of genes, for example in lacI; and UUG, in 3% of the E. coli genes, for example in lacA (Blattner et al., 1997). Only a few annotated non-canonical start codons are known, for example AUU in infC (Sacerdot et al., 1982). Interestingly, infC encodes IF3, which is implicated in accuracy control against the start codon (Hartz et al., 1989). So, starting from the non-cognate AUU, the infC gene is preferentially translated when IF3 concentrations are low – a charming example of negative feedback regulation (Sacerdot et al., 1996). Initiation efficiency depends on which start codon is used and decreases with decreasing occurrence, at least for the three mostly used start codons (Ringquist et al., 1992).

Shine-Dalgarno sequence A well-known feature of bacterial translational start sites is the Shine-Dalgarno (SD) sequence (Shine and Dalgarno, 1974). Proper spacing between the SD sequence and the start codon is essential (Chen et al., 1994; Shine and Dalgarno, 1974). The SD sequence increases the affinity of 30S ribosomal subunits to the mRNA (Calogero et al., 1988) and was thought to be the major factor determining translation efficiency.

However, bacteria differ greatly in the fraction of genes that contain an SD sequence, for example 11% in Mycoplasma genitalium, 57% in E. coli, and 89% in Bacillus subtilis (Ma et al., 2002). Whereas translation in B. subtilis depends almost exclusively on the presence of an SD, and expression levels strongly correlate with SD strength – defined as the level of SD-antiSD complementarity – (Ma et al., 2002), SD sequences are not essential for correct initiation in E. coli (Calogero et al., 1988).

Also, the correlation between SD and translation efficiency is not as straightforward as initially thought. In B. subtilis, but not in E. coli, the length of an SD sequence positively correlates with expression (Ma et al., 2002). Rather, increasing the length of the SD sequence above a moderate length, decreases translation activity in E. coli (Komarova et al., 2002; Vimberg et al., 2007). Probably, a very stable SD-antiSD duplex traps the ribosome in the initiation complex, thus decreasing translation efficiency (Komarova et al., 2002). This is corroborated by the altered preference of SD length upon changing growth temperature (Vimberg et al., 2007). Additionally, binding of model mRNAs to 30S subunits in vitro showed that the length of an SD sequence does not affect association kinetics, but that a long SD decreases the dissociation rate of the 30S from the model mRNAs (Studer and Joseph, 2006). This may explain why SD-like sequences in the CDS slow down translating ribosomes (Li et al., 2012).

28

Recent global analyses have shown that the SD strength is not a good predictor of translation efficiency in E. coli (Del Campo et al., 2015; Quax et al., 2013), although the presence of an SD can positively affect translation of individual cistrons (Komarova et al., 2002; Vimberg et al., 2007). All in all, the canonical model of bacterial translation seems to describe translation initiation in B. subtilis rather than in E. coli, and a universal model would therefore be too simplistic.

Ribosomal protein S1 S1, with its 557 amino acids and 61 kDa, is the largest ribosomal protein in E. coli (Subramanian, 1983). It can be free in the cytoplasm or loosely associated with the ribosome (Subramanian and van Duin, 1977), and is located at the junction of the platform, head and main body of the 30S subunit (Sengupta et al., 2001). S1, encoded by the rpsA gene, is essential in Gram-negatives like E. coli (Duval et al., 2013; Sørensen et al., 1998). B. subtilis, a model organism for Gram-positive bacteria with low GC-content, contains a homologue of rpsA, but its product is not essential (Kobayashi et al., 2003; Sorokin et al., 1995). Although S1 likely plays a role in rescuing stalled ribosomes, and in protection of single-stranded RNA from RNase E degradation [see (Hajnsdorf and Boni, 2012) and references therein], I only focus on its major role: that in translation initiation.

In E. coli, ribosomal protein S1 consists of six homologous OB (oligonucleotide/oligosaccharide-binding)-fold repeats, the S1 domains or S1 motifs (Bycroft et al., 1997; Salah et al., 2009; Subramanian, 1983). Some organisms have different numbers of S1 motifs (Salah et al., 2009). OB-fold domains specifically bind single-stranded nucleic acids, explaining the high affinity of S1 to mRNA (Draper and von Hippel, 1978). The S1 repeats are not equivalent: the first two domains bind protein instead of RNA, thus anchoring S1 on the ribosome, whereas the other four bind RNA (Duval et al., 2013; Salah et al., 2009). The RNA-binding domain is connected via a flexible hinge to the protein-binding part of S1 (Subramanian, 1983). This might account for the fact that, unlike the SD, S1 binding regions on the mRNA have no strict spatial preferences [suggested in (Komarova et al., 2002)].

S1 interacts with 10-11 nucleotides of single-stranded RNA (Qu et al., 2012; Sengupta et al., 2001). It does not have strict sequence preferences (Subramanian, 1983), although it binds preferentially and with high affinity to A/U-rich sites (Boni et al., 1991; Subramanian, 1983). S1 binding can however be highly specific, as it recognizes its own mRNA to negatively autoregulate its own synthesis (Skouv et al., 1990; Tchufistova et al., 2003).

It is well-established that S1 can unwind RNA (Bear et al., 1976; Kolb et al., 1977; Rajkowitsch and Schroeder, 2007; Thomas et al., 1978). More recently, a single-molecule study showed that purified S1 is able to melt long RNA stem-loop structures step-by-step, independent of ATP or GTP, by

29

binding to the single-stranded pieces of RNA that become available upon unzipping of the RNA (Qu et al., 2012). The kinetics of unwinding are dependent on the RNA sequence, and mainly the GC-content (Qu et al., 2012).

mRNA structure and initiation efficiency For the initiation complex to form, several base-pairing interactions need to take place: the initiation codon with the initiator tRNA, and the SD sequence with the antiSD sequence in 16S rRNA. This implies that mRNA structure in this region would interfere with translation initiation. The relationship between mRNA structure and translation efficiency was studied extensively for the coat protein cistron of the RNA phage MS2 (de Smit and van Duin, 1990). Its SD sequence and start codon are sequestered in a stable stem-loop. The stability of this stem-loop was changed by site-directed mutagenesis, and the translation efficiency determined. As expected, translation efficiency decreased upon stabilizing the stem-loop, and compensatory mutations restored expression (de Smit and van Duin, 1990).

The inhibitory effect of mRNA structure on translation is not limited to those comprising the SD and start codon. For example, one study investigated the influence of stable hairpins in N-terminal regions of mRNAs in E. coli (Espah Borujeni et al., 2017). When a hairpin overlapped with the initiating ribosome’s footprint – identified as extending 13 nucleotides into the coding region – translation was strongly inhibited. Additionally, these authors calculated that the extent of translational repression depends on the free energy needed to unfold the overlapping region (Espah Borujeni et al., 2017).

Furthermore, a library of synonymous mutations in the coding region of gfp showed that the mRNA folding energy in the region up to nucleotide +37 was the major predictor of translation efficiency (Kudla et al., 2009). Also, analyzing the effect of rare codons at the N-terminus on gene expression levels in a large synthetic library, it was concluded that reduced mRNA structure accounts for most of the differences in expression (Goodman et al., 2013). Although a few examples suggest that structure in the RBS does not impede, or even enhances, translation (Jagodnik et al., 2017; Sacerdot et al., 1998), structure within the whole RBS region usually countercorrelates with protein synthesis rates.

Global analyses Thus, not only the SD and the start codon, but the entire RBS needs to be accessible for the ribosome to bind efficiently. Several global studies have shown a low secondary structure propensity around RBS sites. For example, the analysis of more than 4000 E. coli genes found the secondary structures to be significantly weaker from nucleotide -4 to +37, relative to the start codon, than downstream in the CDS (Kudla et al., 2009). Also, an analysis

30

of RNA structure and ribosome profiling in E. coli demonstrated that the regions ~20 nucleotides upstream and 10-30 nucleotides downstream of the start codon are less structured (Del Campo et al., 2015). Generally, bacterial mRNAs exhibit reduced structural stability around the start sites, and this bias is stronger in GC-rich bacteria (Bentele et al., 2013), implying the importance of mRNA structure. Finally, a genome-wide analysis of 340 species (bacteria, archaea, and eukaryotes) found a reduced stability of mRNA structure in the first 30 nucleotides of the coding region in most species analyzed (Gu et al., 2010). Interestingly, the mRNA stability around the start site correlated with GC-content and optimal growth temperature – the higher the GC-content or the lower the growth temperature, the stronger the reduction in stability (Gu et al., 2010).

Codon bias The reduced secondary structure around start sites and its correlation with expression levels seems striking. However, it has been argued that the role of mRNA structure around the start site is overrated, since several studies (Goodman et al., 2013; Kudla et al., 2009) used reporter systems with unusually strong secondary structures. In these reports (Supek and Šmuc, 2010; Tuller and Zur, 2015), the authors argued that secondary structure is generally weak around the start site, and that in these cases codon bias is relevant. When secondary structure is strong, it limits protein expression, and codon bias becomes irrelevant. In other words, when initiation is not limited by mRNA structure, the limiting factor may shift to elongation rate.

On the other hand, a global study showed that codon content influences expression more strongly than mRNA structure, except in the first 16 codons, where structure has more impact (Boël et al., 2016). Another study distinguished between the codon ramp hypothesis (Tuller et al., 2010) mentioned above and the structure hypothesis – predicting that rare codons are selected to reduce secondary structure (Bentele et al., 2013). Comparing two sets of rare codons near the 5’-ends of E. coli genes – those with G/C (GC3) versus those with A/U in the third codon positions (AU3) – they found that AU3 codons are selectively enriched, and that abundant GC3 codons are strongly depleted, supporting the structure hypothesis (Bentele et al., 2013).

The debate on the extent to which mRNA structure and codon bias determine translation efficiency is ongoing. In any case, the majority of genes have likely been optimized for high translation rates – reflected in the general pattern of low structure content around the start site – and mRNA structure around the start site is probably the first or major determinant of translation efficiency. Secondary structure also plays an important role in translational regulation, which is the subject of a later section.

31

Translational enhancer sequences Several studies have shown that short sequence motifs, near the 5’-end of the mRNA, can serve as translational enhancers, both upstream (Hook-Barnard et al., 2007; Sharma et al., 2007) and downstream (Martin-Farmer and Janssen, 1999; Qing et al., 2003) of the start codon. These translational enhancement effects could be due to various mechanisms. For example, A/U-rich sequences upstream of SD sequences can enhance translation, probably by serving as a binding target for ribosomal protein S1 (Komarova et al., 2002, 2005). The deletion of an A/U-rich enhancer in the 5’-UTR of fepB decreased its expression five-fold, by affecting ribosome binding at this enhancer region (Hook-Barnard et al., 2007). Interestingly, it was shown that Spot42 inhibits translation of sdhC mRNA by recruiting Hfq to an A/U-rich sequence in the RBS, which then prevents ribosome binding to this region (Desnoyers and Massé, 2012).

Additionally, CA-motifs can serve as translational enhancers. For example, multimers of CA, 3' of the start codon of a leaderless lacZ gene, increase expression with increasing numbers of CA-repeats (Martin-Farmer and Janssen, 1999). This enhanced expression is independent of coding sequence, as several other genes showed similar increases in expression when CA-repeats were added downstream of the start codon. Toeprint analyses showed that mRNAs rich in these CA-repeats bind ribosomes more efficiently, suggesting that the CA-repeats enhance initiation. Interestingly, adding (CA)5 greatly enhanced expression whereas C5A5 and A5C5 barely had an effect (Martin-Farmer and Janssen, 1999).

Another example of CA-elements acting as translation enhancers is provided by a study of the sRNA GcvB which regulates more than 40 targets (Sharma et al., 2011). GcvB represses several of its targets (oppA, dppA and gltI) by binding to CA-rich elements inside and upstream of ribosome binding sites, as shown by in vivo and in vitro experiments (Sharma et al., 2007). The translation-enhancing effect of the CA-rich region was shown more directly by removing the CA-rich element from one mRNA, giving a threefold reduction in translation, and inserting it into an unrelated mRNA, which increased expression twofold in vitro (Sharma et al., 2007). Yet another study showed that GcvB represses yifK, another target, also via a CA-rich element (Yang et al., 2014). Mutagenesis experiments suggested that these ACA motifs enhance yifK translation, and that strict positioning of this element is not necessary (Yang et al., 2014).

Translation initiation efficiency – an interplay of factors As we have seen, translation initiation efficiency depends on various factors that are inherent in the mRNA: start codon, SD sequence, mRNA secondary structure, and translational enhancer sequences. Additionally, protein S1 is a

32

major determinant of translation efficiency of many genes. The contribution of each of these factors to the initiation efficiency is not always clear, and depends on the combined presence and strength of them. Several studies have investigated the interdependence of various factors and their combined contribution to protein output.

B. subtilis versus E. coli Bacterial translation is not a universal mechanism; there are differences between species. As mentioned before, B. subtilis ribosomes translate almost exclusively mRNAs with an extended SD and which lack secondary structure in the initiation region (Ma et al., 2002), whereas E. coli ribosomes translate all kinds of mRNAs (Calogero et al., 1988; Ma et al., 2002). Interestingly, B. subtilis has a lower GC-content than E. coli, which leads to a less stable mRNA structure in general. A major difference between E. coli and B. subtilis ribosomes is the presence of ribosomal protein S1 which is essential in E. coli (Duval et al., 2013; Sørensen et al., 1998), but not in B. subtilis (Kobayashi et al., 2003; Sorokin et al., 1995). Hence, S1 may be particularly required for the translation of structured mRNAs, or mRNAs that lack or have poor SD sequences.

SD sequence and protein S1 Several studies have analyzed the combined effects of SD and enhancer sequences. The translation-enhancing effects of A/U-rich regions are thought to be due to the binding of ribosomal protein S1, since A/U-rich translational enhancers are S1 binding sites (Komarova et al., 2002, 2005). One study showed that A/U-rich sequences enhanced translation both in the presence and absence of an SD sequence. Both SD sequence and A/U-rich sequences enhance translation in the tested mRNAs, but the translation-enhancing effects of A/U regions were most strongly in highly translated mRNAs (Osterman et al., 2013). Another study used a different set of constructs with different SD sequences, combined with an A/U-rich enhancer, a weak enhancer or no enhancer. The enhancer strongly amplified the translation-differences between SD sequences. From this, cooperativity between the SD sequence and the enhancer was inferred (Vimberg et al., 2007). Yet another study examined the effects of extended SD sequences (six to ten nucleotides), combined with upstream S1-binding regions. Protein synthesis became less efficient upon increasing the strength of the SD, and S1 binding increased expression. Mobility-shift assays showed that S1-mRNA affinity correlated with expression levels, and toeprinting assays showed the SD-antiSD duplex with S1-depleted ribosomes, but not with ribosomes containing S1. This suggests that S1 can compensate for a too strong SD interaction that traps the ribosome on the RBS (Komarova et al., 2002).

33

SD sequence and mRNA structure Translation of the coat protein gene of bacteriophage MS2 has been studied extensively using site-directed mutagenesis. Since the start site forms a simple stem-loop structure, the SD sequence and the secondary structure at the start site could be manipulated in a predictable way. Weakening the SD sequence reduced translation, but only if translation was inhibited by secondary structure (de Smit and van Duin, 1994a). Interestingly, E. coli mRNAs that lack an SD are considerably less structured around the start codon than mRNAs with an SD sequence (Scharff et al., 2011).

These studies suggest that SD and secondary structure can compensate for each other – when an SD sequence is weak or hidden, the ribosome becomes more dependent on the presence of an unstructured region in the mRNA, and vice versa.

Leaderless translation and protein S1 One particular difference between leaderless translation and canonical translation is the requirement for protein S1. Although S1 is essential for the translation of many mRNAs in E. coli (Sørensen et al., 1998), it is dispensable for the translation of leaderless mRNAs (Moll et al., 2002, 2004; Tedin et al., 1997). Even more so, lmRNAs can be selectively translated by 70S ribosomes lacking S1 (Kaberdina et al., 2009; Moll et al., 2004). However, a lack of secondary structure in lmRNAs might partially account for the differences in requirement of S1 (Tedin et al., 1997). Since S1 interacts with single-stranded RNA, and, when located on the ribosome, interacts with regions 5’ of the start codon, the presence of S1 on a ribosome should disfavor start site selection of an lmRNA.

SD sequence, protein S1, and mRNA structure Two interesting studies explored the triplicate interplay between SD sequence, mRNA structure, and protein S1. One of them investigated the role of S1 in the initiation of structured mRNAs. Here, the S1 dependence on three different mRNAs from E. coli was analyzed. Using toeprinting in the presence and absence of S1, and mutations to change the strength of the SD sequence, they show that S1 is necessary for initiation on structured mRNAs, but dispensable for translation of mRNAs with an unstructured RBS and a strong SD sequence (Duval et al., 2013).

In another study, SELEX (systematic evolution of ligands by exponential enrichment) was used to evolve RNA ligands against 30S ribosomes, 30S ribosomes depleted of S1, and S1 alone. Interestingly, they found two classes of ligands: unstructured RNAs with an SD sequence, which were generated against S1-depleted ribosomes; and RNAs containing a pseudoknot, which were generated against both intact ribosomes and purified S1 protein (Ringquist et al., 1995). Apparently, the presence of the

34

SD sequence and the absence of strong secondary structure become more important for start site recognition by the ribosome in the absence of S1. At this point, we return to B. subtilis with its low GC-content. Here, protein S1 is not essential, but start site recognition is highly dependent on the SD and the absence of secondary structure (Ma et al., 2002).

All in all, translation efficiency is determined by several effects – start codon, SD strength, mRNA secondary structure, and S1 binding sites – and the translation efficiency of a certain gene depends on the combination of the effects. However, mRNA structure around the start site is a major determinant of translation efficiency in E. coli, and the main subject of this thesis. In the next section, we focus on translational control through mRNA structure.

mRNA structure and translational control Bacteria must adjust to their own needs, responding to internal changes (for example growth phase) and changes in their environment (like the availability of nutrients, changes in temperature, and the presence of other microbes). Regulation of the responses to these kinds of different circumstances occurs on different levels: the transcriptional level, the regulation of the amount of mRNA that is produced; the post-transcriptional or translational level, regulation of the stability of mRNA or the amount of protein produced from each mRNA; and the post-translational level, regulation of the stability or degradation of proteins. Regulation often occurs on multiple levels simultaneously.

Regulation of translation is fast and reversible, enabling the bacteria to respond quickly to changing internal and external factors. Here, I will focus on translational regulation mechanisms that target the initiation region, and thus control translation initiation. The mRNA’s structure at the initiation site often plays a major role in regulation.

Protein-mediated regulation Regulation of translation initiation by proteins is often autoregulation and occurs in several ways. Mechanisms by which proteins bind to mRNA in order to prevent ribosome binding to the initiation site can be divided into a competition and an entrapment model. A well-characterized example of the competition model is the threonyl-tRNA synthetase (ThrRS), which represses translation of its own mRNA thrS (Springer et al., 1985). Part of the thrS mRNA structure mimics the anticodon domain of tRNAThr (Graffe et al., 1992; Romby et al., 1992). Binding of ThrRS to its mRNA competes with ribosome binding and thus inhibits translation initiation (Moine et al., 1990).

35

A classic example of entrapment is the translation of protein S15, encoded by rpsO. Protein S15 binds to specific motifs on its rpsO mRNA, which are structurally similar to the motifs that S15 recognizes on 16S rRNA during ribosome assembly (Ehresmann et al., 2004; Mathy et al., 2004). When S15 binds to its mRNA, rpsO adopts a pseudoknot structure (Ehresmann et al., 1995; Philippe et al., 1994). Although the SD sequence is accessible to 30S subunits, the bound S15 stabilizes the pseudoknot, and inhibits the formation of the real IC (Philippe et al., 1993). This complex is trapped in a pre-initiation form, and called the stalled 30SIC.

Another example of autoregulation is IF3, which discriminates against non-cognate initiation codons (La Teana et al., 1993). Its own mRNA, infC, is translated from the non-canonical AUU. When IF3 levels are high, it thus represses its own translation (Butler et al., 1986).

In addition to autoregulation of proteins, several specialized RNA-binding proteins (RBPs) are able to regulate translation. One well-characterized example in E. coli is CsrA (carbon storage regulator A), which has multiple targets (Holmqvist et al., 2016). CsrA can cause either up- or downregulation of its target proteins, often by controlling access to the RBS (e.g. Baker et al., 2002; Dubey et al., 2003).

RNA sensors RNA sensors comprise a variety of elements that sense external factors, like small molecules (riboswitches), temperature (thermosensors), or pH (pH sensors). Here, I will just give a few examples of RNA sensors and how they affect translation initiation.

Thermosensors RNA thermosensors are structures that respond to temperature to control gene expression. They are often involved in heat- or cold-shock responses, but also play a role in temperature-dependent virulence of pathogens. RNA thermosensors often contain a stable RBS structure that is melted upon elevated temperature, thus increasing the accessibility for 30S ribosomes. For example, the E. coli heat-shock transcription factor σ32 is encoded by rpoH. Increased rpoH expression results from heat-induced destabilization of the inhibitory RBS structure (Morita et al., 1999). The increased σ32

expression then induces heat-shock gene transcription.

Riboswitches Riboswitches are mRNA elements whose aptamer domains bind small molecules (ligands) to regulate the mRNA in cis (e.g. reviewed in Tucker and Breaker, 2005). Regulation by means of riboswitches is often found in metabolic pathways – the mRNA responds to small molecules in the environment, and binding confers mRNA structure change that promotes ON or OFF states. Riboswitch mechanisms can differ, resulting in induced

36

transcription termination or anti-termination, or inhibition/activation of translation.

sRNA-mediated regulation Bacterial small RNAs (sRNAs) or antisense RNAs exert their functions primarily by base-pairing to mRNAs, thereby influencing their target’s translation and/or degradation rates (e.g. reviewed in: Storz et al., 2011; Urban and Vogel, 2007; Wagner and Romby, 2015). Antisense RNAs can be divided into cis-encoded sRNAs (synthesized from the opposite strand of their target mRNA), or trans-encoded sRNAs (transcribed from different loci). Cis-encoded sRNAs are fully complementary to their targets, whereas trans-encoded sRNAs are only partially complementary and have the potential to target multiple mRNAs. The RNA chaperone protein Hfq is often required for the activity of trans-encoded sRNAs (e.g. reviewed in Vogel and Luisi, 2011). On the translational level, antisense sRNAs can either inhibit or activate initiation.

sRNA-mediated activation of initiation most often occurs through relieving inhibitory structure containing the RBS, which is the case for regulation of rpoS by DsrA. Under normal growth conditions, translation of RpoS, encoding the Sigma factor σ38, is inhibited by a secondary structure containing the SD sequence. Binding of DsrA to the 5’-UTR of rpoS promotes translation by opening the inhibitory structure (Majdalani et al., 1998).

Most sRNAs inhibit translation by preventing ribosome access to the RBS. Superoxide dismutase, which protects the cell from oxygen damage by removing oxygen radicals, is encoded by sodB. Under iron-limiting conditions, the sRNA RyhB is transcribed, binds to the RBS of sodB, and inhibits translation by competing with ribosome binding (Geissmann and Touati, 2004; Massé and Gottesman, 2002).

Antisense sRNAs can also inhibit by binding at regions upstream or downstream of the SD sequence and start codon. One study investigated the inhibitory effect of antisense RNAs on translation initiation in Salmonella. Here, the sRNAs did not overlap with the SD sequence or start codon, but still IC formation was strongly inhibited upon sRNA binding within 14 +/- 2 nucleotides into the coding region, suggesting a five codon window for translational repression by sRNAs (Bouvier et al., 2008). Another example is the regulation of replication frequency of plasmid R1. Synthesis of the replication initiator protein RepA is prevented by an antisense RNA, CopA, that binds to its target, CopT (Nordström et al., 1988). Binding occurs upstream of the SD sequence of tap (translational activator peptide), a short leader peptide located upstream of repA (Blomberg et al., 1992). Although CopA binding does not occlude the SD or start codon, the RNA duplex immediately upstream prevents translation (Malmgren et al., 1996).

37

These examples show that initiation can be inhibited by antisense sRNAs binding upstream or downstream of the SD sequence and start codon. These examples are similar to mRNA structure in the coding region being inhibitory (Espah Borujeni et al., 2017), and indicate that ribosomes interact with a region, the RBS, that is larger than the SD sequence and the start codon.

Translational coupling Translational coupling is common for bacterial genes, which are often polycistronic. In cases of translational coupling, translating ribosomes unfold the inhibitory structure at the following RBS. This is different from the previously-discussed 70S-scanning mode of initiation, in which the ribosome scans the mRNA for another start site. Scanning ribosomes are not translating, do not require energy, and are probably hindered by strong secondary structures.

Ribosome standby sites The ribosome standby model was proposed about 15 years ago. Ribosome standby sites can be considered part of an initiation site, and a way to regulate protein expression. Since much of the work described in this thesis is built on the standby model, it deserves a separate chapter.

Structured mRNAs and a paradox

An equilibrium model As mentioned before, the start site of the phage MS2 coat gene has been studied extensively for the relationship between secondary structure around the RBS and translational efficiency. The SD and start codon of the MS2 coat protein are sequestered in a simple hairpin structure. Mutational analysis showed that translation efficiency depends on the stability of this stem-loop – when the stem-loop is destabilized, translation increases, up to a certain point at which the stem-loop is no longer inhibitory and translation is at its maximum. The mutations only change protein expression if they change the stability of the stem-loop, and the availability of cellular components is not limiting for translation (de Smit and van Duin, 1990).

The relationship between stability of the stem-loop and translation efficiency follows a simple curve predicted by an equilibrium model. In this model, the association/dissociation of the 30S ribosome to the single-stranded RBS (the first equilibrium) competes with spontaneous folding/unfolding of the RBS hairpin (the second equilibrium). Here, we assume that the subsequent step of the ribosome entering translation is

38

relatively slow, so the preceding steps can reach equilibrium. This model predicts that a more stable RBS structure implies that fewer ribosomes per unit time can be bound to the start site, and expression decreases. When the hairpin is weaker, expression increases, until at a certain stability of the stem-loop (the wild type case) virtually all mRNAs were thought to contain a bound ribosome (de Smit and van Duin, 1990) and expression has reached its maximum.

Another study on the expression of the coat protein, investigating the roles of SD sequence and secondary structure, found that a stronger SD sequence increases expression, but only in the presence of impeding secondary structure (de Smit and van Duin, 1994a). If the secondary structure is weakened by mutations so that it no longer limits translation, changing the SD complementarity hardly has an effect (de Smit and van Duin, 1994a). As increasing complementarity between SD and antiSD increases ribosome binding to the mRNA, this study corroborates that intramolecular base-pairing competes with ribosome binding.

As it was argued that the MS2 coat protein would not be representative for E. coli translation, because the mRNA is part of the RNA phage genome, the model was quantitatively compared with four other studies in which changes in RBS structure altered translation efficiency. By calculating the secondary structure stability and normalizing the expression level, it was shown that the data from these studies could be fitted to a curve with the same shape as that for the coat protein (de Smit and van Duin, 1994b), supporting the generality of the proposed model. Even more so, the curves from the various RBSs were remarkably similar, indicating that the affinity of the ribosomes for the start sites is comparable in all cases (de Smit and van Duin, 1994b).

A paradox Several years later, when measured 30S affinity for mRNA and updated parameters to calculate the stability of mRNA structures were available, the earlier results from the coat protein system (de Smit and van Duin, 1990) were re-examined (de Smit and van Duin, 2003a). It turned out that the 30S subunit competes much better than expected with the stem-loop structure, as if its affinity for the message was about 105-fold higher than the value measured in vitro. In other words, the stem-loop of the coat RBS is so strong that it should be completely inhibitory, but still the mRNA is efficiently translated (de Smit and van Duin, 2003a). It was a paradox.

Calculations The originally used equilibrium model predicts a curve describing protein expression as a function of the structure stability around the RBS. Based on the curve with the best fit to the experimental data from the mutational analysis, a value for the binding constant of the 30S on the mRNA was

39

calculated (de Smit and van Duin, 2003a). This association constant of 1012 M-1 is roughly 105 times higher than that measured in vitro [107 M-1(de Smit and van Duin, 2003a)].

Additionally, the hairpin stability was recalculated based on updated parameters. The rate of folding was now estimated to be about 105 s-1, the average lifetime of the unfolded state being 10 µs (de Smit and van Duin, 2003a). Alternatively, the lifetime of the unfolded state was calculated based on the known initiation frequency and the hairpin stability, leading to the even lower estimated value of 0.1 µs (de Smit and van Duin, 2003a). Therefore, the binding of a 30S subunit needs to occur within a timeframe in the order of microseconds. Based on estimates of the binding-rate constant of 30S and the concentration of free subunits in the cell (de Smit and van Duin, 2003a), it would take about 4 ms for the RBS to collide with a free 30S subunit (de Smit and van Duin, 2003b), which makes initiation within the order of microseconds highly unlikely.

So, in short, the time frame during which the coat protein RBS is in its open state is much too short to capture a 30S ribosome from solution. Instead, the 30S subunit needs to be already at or near the RBS in its folded state, in order to allow the observed translation rates. This notion has led to the proposal of the ribosome standby model (de Smit and van Duin, 2003a).

A solution: the ribosome standby model

The model – from thermodynamics to kinetics Because of the conflicting values between predicted and observed coat translation rates, the concept of translational standby sites was introduced. In the standby model, the ribosome is already present at or near the RBS in its folded state, waiting for the transient opening of the structure, after which the ribosome snaps into place to form the bona fide initiation complex (de Smit and van Duin, 2003a). For the ribosome standby model to explain the high translation rates, the original equilibrium model needed to be replaced by a kinetic model, taking time factors into consideration.

To arrive at the kinetic model, several modifications were necessary (de Smit and van Duin, 2003a). First, equilibrium constants were replaced by component rate constants, so no assumptions were made on reactions being in equilibrium. Secondly, return routes for 30S and the RBS were introduced, to not deplete the system of those components. Thirdly, standby binding was introduced as a separate complex: the 30S bound to the folded state of the mRNA. And finally, the step of initiation complex formation was separated from the subsequent remaining steps of initiation. The system was assumed to be in steady state, i.e. all concentrations remain unchanged over time. With this new model, the quantitative consequences of the introduction of standby binding could be calculated (de Smit and van Duin, 2003a).

40

Quantitative effects of the standby model Translation initiation on a structured mRNA can be considered a second order reaction, since it depends on both the rate of unfolding of the inhibitory structure, and on the concentration of free 30S subunits. In the standby model, a 30S subunit is already present at or near the start site in its folded state. Upon transiently unfolding of the RBS, the 30S can slide into place and form the initiation complex. This is now a first order reaction, as it is independent of the concentration of free 30S. This should increase the initiation efficiency on structured mRNAs. But how much does it increase?

We go back to the graph that shows translation efficiency as a function of stem-loop stability for the MS2 coat gene (de Smit and van Duin, 1990). Replacing the original equilibrium model with a kinetic model both with and without the standby binding complex, it turned out that the kinetic model predicts translation rates that are several orders of magnitude higher than those predicted by the equilibrium model. Even more so, the kinetic standby model predicts translation rates that are quantitatively consistent with the data from the mutational experiments (de Smit and van Duin, 2003a). All in all, the standby model explains the paradoxical observation of high translation rates from a stably sequestered RBS.

The nature of standby sites A previous study (de Smit and van Duin, 1993) – a follow-up on the mutational analysis of the MS2 coat protein stem-loop (de Smit and van Duin, 1990) – had investigated the role of the native upstream sequences of the MS2 coat protein, as they were shown to enhance translation. In the absence of the upstream sequences, destabilizing mutations increased expression, whereas those mutations had no effect in the presence of the upstream region. The extent of enhancement of the native upstream sequences was similar to a slight destabilization of the stem-loop (de Smit and van Duin, 1993). Although the standby model was proposed only years later, this study fits the model, and the upstream regions could function as standby sites.

As the RBSs of several RNA phages’ coat proteins are stably sequestered, their secondary structures were examined. These RNAs are largely structured. However, the region around the RBS of the coat protein of all these phages is unusually unstructured. These relatively unstructured regions are about 120 nucleotides long and contain roughly 45 unpaired nucleotides – similar to the length of a ribosome footprint (Hüttenhofer and Noller, 1994) – and were proposed to function as standby sites to promote high coat translation (de Smit and van Duin, 2003b). Also the region around the replicase gene of several RNA phages is strongly structured. However, ribosomes translating the coat protein gene open up a region around the replicase RBS that is similar to the proposed standby sites of the coat

41

protein. Although the replicase RBS itself is still sequestered in a stable structure, efficient translation could theoretically occur through standby binding at the nearby single-stranded region (de Smit and van Duin, 2003b).

Because of the observations mentioned above, and since ribosomes do not bind to double-stranded RNA, standby sites are most likely single-stranded in nature. However, the 30S ribosome can accommodate some structure (see below) and standby sites could be any RNA sequence or structure that increases the life-time of a 30S on the mRNA.

Support for the standby model

Experimental support Since the proposal of the standby model in 2003 (de Smit and van Duin, 2003a), there have been only few studies providing experimental support for the model. One of these is an in vitro study that used fluorescence-based assays to monitor the binding of 30S subunits to various model mRNAs and the role of the initiation factors therein (Studer and Joseph, 2006). By examining both association and dissociation kinetics, it was shown that 30S binding to structured mRNAs is a two-step process. First, the 30S subunit binds to a single-stranded region of the mRNA. This step is independent of the presence of an SD sequence, initiator tRNA, or the initiation factors. Instead, this step requires the presence of a single-stranded region. The initial step is followed by a slower second step, which requires unfolding of the mRNA structure comprising the SD. This step depends on the SD sequence, the codon-anticodon interaction, and GTP-bound IF2. Dissociation of the formed 30S initiation complex depends on SD strength, mRNA structure stability, and the presence of initiator tRNA and initiation factors. Once the true initiation complex is formed, dissociation is slow (Studer and Joseph, 2006).

As it was shown that 30S association to structured mRNAs occurs in two steps (Studer and Joseph, 2006), this study strongly supports the standby model. The first step – solely depending on the presence of a single-stranded region – corresponds to standby binding, whereas the second step depicts the unfolding of the inhibitory structure. An interesting follow-up experiment would have been an additional set of kinetics assays in the presence versus absence of S1, in order to examine the role of this protein in the mRNA unfolding.

A second study supporting the standby model comes from our own lab (Darfeuille et al., 2007). In a type I TA system, the tisB locus encodes a toxin, TisB, that is induced by the SOS response (Vogel et al., 2004). Upon insertion into the membrane, TisB causes a loss of membrane potential (Unoson and Wagner, 2008), and this is implicated in persister formation (Berghoff et al., 2017). IstR1 is the constitutively synthesized antisense sRNA that counteracts toxicity (Vogel et al., 2004).

42

tisB mRNA is present in three different forms. When the inactive primary transcript +1 is transcribed, it gets endonucleolytically processed to yield the translationally active +42 mRNA. Binding of the antitoxin IstR1 induces RNaseIII cleavage, resulting in the translationally inactive +106 mRNA (Darfeuille et al., 2007). The 3’-domains of all three mRNA species are structurally identical and contain the highly sequestered RBS; only the 5’-domains differ (Darfeuille et al., 2007). Whereas most sRNAs target TIRs by directly base-pairing to the SD sequence and/or initiation codon, IstR1 binds to a region ~100 nucleotides upstream of the tisB RBS. Although the binding of IstR1 to tisB induces mRNA cleavage, its primary effect is inhibition of translation. Additionally, the +106 mRNA is translationally inactive, even though its RBS remains completely intact. So, how does binding of IstR1 or truncation into +106 render the active +42 translationally inactive? Translational coupling, as well as secondary structure changes were ruled out as mechanisms (Darfeuille et al., 2007). Instead, IstR1-dependent regulation can only be explained by the ribosome standby model: processing of +1 exposes a single-stranded region ~100 nucleotides upstream of the RBS. This single-stranded region (a standby site) is blocked by binding of IstR1, thus inhibiting translation (Darfeuille et al., 2007). Similar to the MS2 coat protein mutations (de Smit and van Duin, 1990, 1993), destabilizing mutations in the RBS of tisB , which made the SD sequence more accessible, were made. These mutations enhanced translation of +1 and +106, but not +42 mRNA, showing that standby sites can relieve the need for destabilizing the RBS (Darfeuille et al., 2007). These experiments exhibit standby binding as a means to overcome inhibitory structure at the RBS.

Ribosomes and mRNA structure RNA is rarely completely structure-free, and standby sites are likely to include some secondary structure, as in the proposed standby sites of bacteriophages’ coat and replicase proteins (de Smit and van Duin, 2003b). Several studies have shown that 30S ribosomes are in fact able to accommodate mRNA structure when binding to single-stranded regions. For example, the 5’-UTR of thrS, encoding ThrRS, contains 4 domains: one containing the SD sequence and initiation codon, one single-stranded region, and two stable hairpin structures (Sacerdot et al., 1998). The hairpin domains are recognized by ThrRS to regulate its own translation (Springer et al., 1985). The single-stranded upstream region and the RBS are brought together by the presence of one of the stable hairpins, and thrS thus contains a so-called split ribosome binding site. The hairpin, however, is not inhibitory for translation (Sacerdot et al., 1998), suggesting that initiating ribosomes are able to accommodate mRNA structure.

Another example is the binding of rpsO mRNA on the ribosome, which was studied both in the presence and absence of protein S15 (Marzi et al., 2007), known to inhibit its own translation (Philippe et al., 1993). Cryo-EM

43

pictures of several complexes were compared in order to deduce the confor-conformation of rpsO mRNA on the ribosome (Marzi et al., 2007). The rpsO-S15 complex binds to the platform of the 30S ribosome. The mRNA adopts a pseudoknot conformation that is stabilized by S15, thus preventing initiation complex formation, as was shown by the cryo-EM structures and toeprinting experiments. In the absence of protein S15, the mRNA is single-stranded and forms the active initiation complex, contrary to the stalled one in the presence of S15 (Marzi et al., 2007). This study is very much in line with the previously-mentioned in vitro study showing that binding of structured mRNAs to 30S occurs in two steps – the docking of structured messenger, followed by unfolding and relocation of the ribosome (Studer and Joseph, 2006).

In addition to the 30S ribosome’s ability to accommodate structure, ribosomal protein S1 potentially plays a major role in standby. Recalling the functions of S1, it is 1) a single-stranded RNA-binding protein that preferably binds to A/U-rich sequences, but also to pseudoknots, 2) an RNA-unwinding protein, 3) essential in E. coli, but not in B. subtilis and other Gram-positives with a low GC-content, and in which translation is dependent on unstructured mRNAs with a strong SD sequence, and 4) necessary for the translation of structured mRNAs. It is therefore likely that S1 has two distinct functions in translation initiation: first the binding of the mRNA, then the unfolding of structures comprising the RBS [suggested in (Duval et al., 2013)].

Ribosome scanning For the standby model to work, the ribosome needs to translocate from the standby to the initiation site. Whereas translating ribosomes consume energy and are capable of melting strong secondary structure, the translocation of standby ribosomes should occur by scanning, and without using energy. There is evidence for the scanning potential of ribosomes. Firstly, the standby mechanism in the TisB/IstR1 TA system suggests ribosome scanning as a way to reach the start site. When a primer was annealed between the proposed standby site and the start site, not interfering with either standby binding or initiation complex formation, toeprint formation was nevertheless inhibited (Darfeuille et al., 2007). Thus, the primer seems to act as a roadblock, preventing standby ribosomes from sliding into place.

Additionally, studies on reinitiation of, for example, the lysis genes of RNA bacteriophages MS2 and fr illustrate ribosome scanning. Here, translation of the coat gene is thought to expose a standby site for the translation of the subsequent lysis gene. Terminated, but not disassociated, ribosomes can reach the lysis start codon through scanning. Inserting alternative start codons resulted in initiation from the start codon located closest to the termination site (Adhin and van Duin, 1990). Reinitiation can occur both upstream and downstream of the stop codon, indicating a

44

bidirectional ribosome scanning potential. However, reinitiation at upstream regions is possible only at shorter distances than downstream reinitiation, probably because scanning ribosomes are met by translating ribosomes (Adhin and van Duin, 1990). More recently, several studies suggested alternative modes of initiation, in which scanning plays a central role (Beck and Janssen, 2017; Yamamoto et al., 2016). Therefore, ribosome scanning may be more common than previously thought.

Standby binding – a universal feature of TIRs? Until now, we have thought about standby sites as they were originally defined – mRNA structures or sequences that are needed to overcome inhibitory structure at the RBS (de Smit and van Duin, 2003a). However, the role of standby sites has been addressed in a general way, without specific structure-dependent limitations around the RBS (Espah Borujeni et al., 2014). Using biophysical modeling, it was shown that the combined length of an accessible surface area in a 5’-UTR – defined as standby sites – is one of the components that add up to predict an mRNA’s translation rate. mRNA structures are partially and selectively unfolded to create a greater single-stranded surface area, and the study suggests a potential for ribosome sliding (Espah Borujeni et al., 2014). The biophysical model thus developed (Salis, 2011) accurately predicts translation rates of a wide variety of mRNAs with differently structured TIRs (Espah Borujeni et al., 2014, 2017).

The two definitions – standby binding as a means to overcome inhibitory structure, and standby binding as a general way to enhance expression – are in no way mutually exclusive. In fact, we could very well be looking at the same mechanism. Standby binding could be the general mechanism for translation initiation, representing the first docking step, followed by the unfolding (Marzi et al., 2007; Studer and Joseph, 2006). This first docking or standby binding step might get revealed only when the RBS is sequestered. When the RBS is unstructured, the RBS itself could be viewed as the standby site, in which case the standby step could not be distinguished from IC formation.

As we have seen above, the contribution of each aspect of TIRs to translation efficiency depends on their combination. For example, translation of structured mRNAs in E. coli depends more strongly on the strength of an SD sequence, and protein S1 is dispensable for translation of unstructured mRNAs. The same could be true for standby sites; standby sites could be a general feature of TIRs – its necessity depending on the presence and/or strength of other factors, mRNA structure in particular.

45

Current investigations

In my work as a PhD student I have focused on a few mechanisms of translation initiation in E. coli, including their control. In the first paper, we investigated the regulation of translation initiation of the transcriptional regulator CsgD. In the second paper, we explored the ribosome standby model, as it was proposed in 2003 (de Smit and van Duin, 2003a). In the third paper, we investigated the sequence preferences of ribosome standby sites, by using flow-seq – flow cytometry-based cell sorting followed by next-generation sequencing. In all these papers, we investigated sequence and structural aspects of translation initiation regions, and tried to understand how these mRNA features determine translation rates.

Regulation of CsgD by sRNAs (Paper I) Biofilms are communities of microorganisms in which the cells stick to each other by forming an extracellular matrix. This sessile lifestyle, as opposed to the motile lifestyle, requires different components, and the switching between them is carefully regulated. CsgA and CsgB form curli: fibres on the cell surface that are necessary for biofilm formation. CsgD is a transcriptional regulator that activates the csgBA operon (Hammar et al., 1995). Cellulose, another component of the extracellular matrix, is also dependent on CsgD (Römling, 2005). Thus, CsgD plays a central role in the transition from motility to biofilm formation.

An in-house target-search algorithm, based on conservation of sRNA-mRNA interactions, predicted the sRNAs OmrA and OmrB (OmpR-regulated sRNAs A and B) to potentially target csgD. Interestingly, the putative sRNA binding site is located far upstream of the translational start site of csgD. We investigated the translation of csgD and its regulation by OmrA and OmrB.

The 5’-UTR of csgD contains two distinct stem-loops. One stem-loop (SL2) contains the SD sequence and start codon of csgD. The other stem-loop (SL1) is located far upstream of the translational start site and contains the binding site for OmrA and OmrB, both of which inhibit csgD translation. We showed that OmrA/B inhibit csgD through a direct antisense mechanism, both in vivo and in a translation-only in vitro system. Regulation depends on the presence of Hfq (Holmqvist et al., 2010).

46

However, inspecting the 5’-UTR structure and the OmrA/B binding site, it remained unclear how the sRNAs inhibit csgD expression. Binding of an sRNA opens up part of SL1, increasing the single-stranded surface area of the RBS, but inhibiting csgD expression. Truncations that remove the OmrA/B binding site and unfold SL1 in a similar way as OmrA/B binding does, instead increases csgD expression (Holmqvist et al., 2010; Fig. 7). Several other factors, like translation of an upstream peptide, have been ruled out as possible mechanisms.

In a follow-up study, my colleagues conducted a functional mapping of the 5’-UTR, in order to investigate the elements involved in translation and regulation of csgD (Holmqvist et al., 2013). They created a large number of mutants, translationally fused to gfp, through error-prone PCR. The mutant library was transformed into E. coli cells, after which the cells were sorted for levels of fluorescence, both in the presence and absence of OmrA. The sorted fractions were analyzed by high-throughput sequencing. As expected, mutations in the OmrA binding site were highly enriched in the pools sorted for high fluorescence. Re-creation of several of these mutants showed that the mutations affected regulation by OmrA, but not csgD translation rate per se. Also expectedly, mutations that stabilized SL2 were enriched in the low fluorescence pools, and vice versa, and mutations in the AUG or SD sequence itself were seen in the expected pools (Holmqvist et al., 2013).

Unfortunately, however, it is still unclear how the csgD 5’-UTR is regulated by OmrA/B. At this moment, we have run out of ideas which we think are worthwhile to investigate.

Ribosome standby sites (Paper II) As discussed extensively in a previous chapter, some bacterial mRNAs have a stably sequestered RBS, and yet, often support efficient translation. In such mRNAs, the requirement for ribosome standby sites was inferred (de Smit and van Duin, 2003a). Based on the standby model, our data (Darfeuille et al., 2007), and corroborating evidence from an in vitro study (Studer and Joseph, 2006), we speculated that ribosome standby may be a solution employed by mRNAs with similar structural features. The main part of my PhD studies has been to investigate what the structure and sequence features of an effective ribosome standby site are.

We designed a reporter system based on the inhibitory stem-loop of the bacteriophage MS2 coat protein. The relatively unstructured flanking regions that were proposed to function as standby sites were removed, resulting in decreased translation rates (Sterk et al., 2018). Then we designed shorter and simpler potential standby sites. Both in vivo and in vitro results show that relatively short, single-stranded regions near a structurally sequestered start site can profoundly increase translation rate. Our results suggest that both the

47

length and the sequence of those single-stranded regions are important for standby site efficiency. We also showed that the binding of an antisense oligo to a standby region – which thereby renders this site double-stranded – almost completely blocks both translation and initiation complex formation in vitro. Additionally, destabilizing mutations in the RBS stem-loop increase translation rates, and decrease the need for additional standby potential (Sterk et al., 2018). This work serves as a proof-of-principle study of the ribosome standby model. Not only did we show that unstructured 5’-tails can act as standby sites to overcome inhibitory structure at the RBS, we also found that highly efficient standby sites can be surprisingly short.

Remaining questions about standby binding involve the mRNA features of standby sites, versus, more generally, the nature of standby binding. To investigate the mRNA sequence and structural features of standby sites, we did a follow-up study, which will be discussed below. Remaining questions about the nature of standby binding include the ribosomal components that are required for standby binding. As discussed before, protein S1 is a major candidate, and its requirement for standby activity is currently being investigated in our lab. Another aspect is the nature of the ribosome binding to the standby site. The 30S bound to a standby site has not been shown biochemically, despite strong and unambiguous circumstantial evidence of standby binding. We have tried to do this by a “toeprint” of 30S subunits in the absence of fMet-tRNAfMet, while trying to use conditions that should allow a weakly bound 30S to remain attached to the mRNA. Additionally, we have cooperated with a group with expertise in high-resolution imaging, and tried to visualize 30S binding to mRNAs with and without standby sites. Unfortunately, to no avail. We are still working on it, and hope to gain more insight into the nature of standby binding.

Sequence preferences of standby sites (Paper III) To obtain more systematic insights into the sequence-dependency of standby sites, we used an unbiased approach in which random stretches of 12 nucleotides are inserted upstream of the translationally inert ribosome binding site which drives the translation of a GFP reporter. Additionally, the random stretch of 12 nucleotides was inserted upstream of a destabilized RBS, which was used in the previous study. Both libraries of cells containing the reporter plasmids were sorted according to levels of fluorescence, and their sequences were analyzed by high-throughput sequencing of the sorted pools.

This global analysis showed that efficient standby sites have a lower GC-content than less efficient ones. This is consistent with the lower secondary structure propensity found around many start sites. The analysis also showed that those nucleotides in the 3’-most position of the standby region that can

48

base-pair with the nucleotide immediately 3’ of the inhibitory stem-loop, were counter-selected in the fractions of cells that displayed efficient translation. We plan to perform mutagenesis and structure probing on selected insert-containing mRNAs to verify that translation is indeed inhibited by formation of a longer inhibitory stem.

Additionally, we inspected the frequency of SD sequences in the standby regions from the different sorted fractions. Slightly surprisingly, SD sequences were less frequently present in the high fluorescence fractions. This might imply that the SD-antiSD interaction is not beneficial for standby binding. Alternatively, the position of the SD sequence in the standby region may play a minor role in translation efficiency. We plan to verify these findings by performing mutational studies. Finally, the nucleotides near the 3’-border of the standby region seem to be more influential for translation efficiency than the nucleotides closer to the 5’-end of the region. However, so far we did not find any dinucleotide combinations that were specifically overrepresented in any of the fractions. In future experiments, we want to analyze the presence of nucleotide motifs in efficient and inefficient standby sites.

49

Acknowledgements

What applies to most projects and most achievements, certainly applies here as well: I could not have written this thesis without the help and support of the great many wonderful people around me.

First of all, my supervisors: thank you, Gerhart, for having me do my PhD in your group. Thank you for your continued support throughout all the years. Thank you for your interest in science and for always taking the time to discuss. Thank you for giving me the freedom to find my own ways. I am grateful for everything I have learned these years! To Fredrik and Staffan: thank you both for your support and care the last two years. Thank you for your optimism just when I needed it most.

My colleagues: thank you for the good and fruitful discussions, for all the help in the lab and with science. Thank you for helping create a good atmosphere at work. Thank you for all the support. Thank you all for the good times in the lab, lunch room, and coffee corner. Thank you for the fun inside and outside of BMC. Many of you have become much more than colleagues.

My friends: thank you for listening to me, for talking with me, for eating and drinking together. Thank you for singing with me, for playing basketball together, for hiking together. Thanks for pulling me out of my head and into life. Thank you for being my friends.

My family, de Sterkjes: thank you for all the fantastic time spent together. Thank you for all the evenings of talking, and for supporting me during all those years that I lived far away. Thank you for going through hard times together and ending up even closer than before. I love you!

My parents: thank you for encouraging me and for always supporting me in my decisions. Papa, thank you for talking about anything, everything. Mama, I know you would have loved to see me now. Thank you for believing in me.

My own family: Sjoerd, thank you for coming to Sweden with me. Thank you for our children. Thank you for being a great father for them. Owen and

50

Julia, I am so happy that you came into my life! I can’t describe how much I love you!

51

References

Adhin, M.R., and van Duin, J. (1990). Scanning model for translational reinitiation in eubacteria. J. Mol. Biol. 213, 811–818.

Andersson, S.G., and Kurland, C.G. (1990). Codon preferences in free-living microorganisms. Microbiol. Rev. 54, 198–210.

Baker, C.S., Morozov, I., Suzuki, K., Romeo, T., and Babitzke, P. (2002). CsrA regulates glycogen biosynthesis by preventing translation of glgC in Escherichia coli. Mol. Microbiol. 44, 1599–1610.

Bear, D.G., Ng, R., Van Derveer, D., Johnson, N.P., Thomas, G., Schleich, T., and Noller, H.F. (1976). Alteration of polynucleotide secondary structure by ribosomal protein S1. Proc. Natl. Acad. Sci. U. S. A. 73, 1824–1828.

Beck, H.J., and Janssen, G.R. (2017). Novel Translation Initiation Regulation Mechanism in Escherichia coli ptrB Mediated by a 5’-Terminal AUG. J. Bacteriol. 199.

Beck, H.J., Fleming, I.M.C., and Janssen, G.R. (2016). 5’-Terminal AUGs in Escherichia coli mRNAs with Shine-Dalgarno Sequences: Identification and Analysis of Their Roles in Non-Canonical Translation Initiation. PloS One 11, e0160144.

Bentele, K., Saffert, P., Rauscher, R., Ignatova, Z., and Blüthgen, N. (2013). Efficient translation initiation dictates codon usage at gene start. Mol. Syst. Biol. 9, 675.

Berghoff, B.A., Hoekzema, M., Aulbach, L., and Wagner, E.G.H. (2017). Two regulatory RNA elements affect TisB-dependent depolarization and persister formation. Mol. Microbiol. 103, 1020–1033.

Blattner, F.R., Plunkett, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., et al. (1997). The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462.

Blomberg, P., Nordström, K., and Wagner, E.G. (1992). Replication control of plasmid R1: RepA synthesis is regulated by CopA RNA through inhibition of leader peptide translation. EMBO J. 11, 2675–2683.

Boël, G., Letso, R., Neely, H., Price, W.N., Wong, K.-H., Su, M., Luff, J., Valecha, M., Everett, J.K., Acton, T.B., et al. (2016). Codon influence on protein expression in E. coli correlates with mRNA levels. Nature 529, 358–363.

Boni, I.V., Isaeva, D.M., Musychenko, M.L., and Tzareva, N.V. (1991). Ribosome-messenger recognition: mRNA target sites for ribosomal protein S1. Nucleic Acids Res. 19, 155–162.

Bouvier, M., Sharma, C.M., Mika, F., Nierhaus, K.H., and Vogel, J. (2008). Small RNA binding to 5’ mRNA coding region inhibits translational initiation. Mol. Cell 32, 827–837.

Burkhardt, D.H., Rouskin, S., Zhang, Y., Li, G.-W., Weissman, J.S., and Gross, C.A. (2017). Operon mRNAs are organized into ORF-centric structures that predict translation efficiency. ELife 6.

52

Butler, J.S., Springer, M., Dondon, J., Graffe, M., and Grunberg-Manago, M. (1986). Escherichia coli protein synthesis initiation factor IF3 controls its own gene expression at the translational level in vivo. J. Mol. Biol. 192, 767–780.

Buttgereit, F., and Brand, M.D. (1995). A hierarchy of ATP-consuming processes in mammalian cells. Biochem. J. 312 ( Pt 1), 163–167.

Bycroft, M., Hubbard, T.J., Proctor, M., Freund, S.M., and Murzin, A.G. (1997). The solution structure of the S1 RNA binding domain: a member of an ancient nucleic acid-binding fold. Cell 88, 235–242.

Byrgazov, K., Vesper, O., and Moll, I. (2013). Ribosome heterogeneity: another level of complexity in bacterial translation regulation. Curr. Opin. Microbiol. 16, 133–139.

Calogero, R.A., Pon, C.L., Canonaco, M.A., and Gualerzi, C.O. (1988). Selection of the mRNA translation initiation region by Escherichia coli ribosomes. Proc. Natl. Acad. Sci. U. S. A. 85, 6427–6431.

Chen, H., Bjerknes, M., Kumar, R., and Jay, E. (1994). Determination of the optimal aligned spacing between the Shine-Dalgarno sequence and the translation initiation codon of Escherichia coli mRNAs. Nucleic Acids Res. 22, 4953–4957.

Darfeuille, F., Unoson, C., Vogel, J., and Wagner, E.G.H. (2007). An antisense RNA inhibits translation by competing with standby ribosomes. Mol. Cell 26, 381–392.

Del Campo, C., Bartholomäus, A., Fedyunin, I., and Ignatova, Z. (2015). Secondary Structure across the Bacterial Transcriptome Reveals Versatile Roles in mRNA Regulation and Function. PLoS Genet. 11, e1005613.

Desnoyers, G., and Massé, E. (2012). Noncanonical repression of translation initiation through small RNA recruitment of the RNA chaperone Hfq. Genes Dev. 26, 726–739.

Draper, D.E., and von Hippel, P.H. (1978). Nucleic acid binding properties of Escherichia coli ribosomal protein S1. I. Structure and interactions of binding site I. J. Mol. Biol. 122, 321–338.

Dubey, A.K., Baker, C.S., Suzuki, K., Jones, A.D., Pandit, P., Romeo, T., and Babitzke, P. (2003). CsrA regulates translation of the Escherichia coli carbon starvation gene, cstA, by blocking ribosome access to the cstA transcript. J. Bacteriol. 185, 4450–4460.

Duval, M., Korepanov, A., Fuchsbauer, O., Fechter, P., Haller, A., Fabbretti, A., Choulier, L., Micura, R., Klaholz, B.P., Romby, P., et al. (2013). Escherichia coli ribosomal protein S1 unfolds structured mRNAs onto the ribosome for active translation initiation. PLoS Biol. 11, e1001731.

Ehresmann, C., Philippe, C., Westhof, E., Bénard, L., Portier, C., and Ehresmann, B. (1995). A pseudoknot is required for efficient translational initiation and regulation of the Escherichia coli rpsO gene coding for ribosomal protein S15. Biochem. Cell Biol. Biochim. Biol. Cell. 73, 1131–1140.

Ehresmann, C., Ehresmann, B., Ennifar, E., Dumas, P., Garber, M., Mathy, N., Nikulin, A., Portier, C., Patel, D., and Serganov, A. (2004). Molecular mimicry in translational regulation: the case of ribosomal protein S15. RNA Biol. 1, 66–73.

Espah Borujeni, A., Channarasappa, A.S., and Salis, H.M. (2014). Translation rate is controlled by coupled trade-offs between site accessibility, selective RNA unfolding and sliding at upstream standby sites. Nucleic Acids Res. 42, 2646–2659.

Espah Borujeni, A., Cetnar, D., Farasat, I., Smith, A., Lundgren, N., and Salis, H.M. (2017). Precise quantification of translation inhibition by mRNA structures that overlap with the ribosomal footprint in N-terminal coding sequences. Nucleic Acids Res. 45, 5437–5448.

53

Geissmann, T.A., and Touati, D. (2004). Hfq, a new chaperoning role: binding to messenger RNA determines access for small RNA regulator. EMBO J. 23, 396–405.

Goodman, D.B., Church, G.M., and Kosuri, S. (2013). Causes and effects of N-terminal codon bias in bacterial genes. Science 342, 475–479.

Graffe, M., Dondon, J., Caillet, J., Romby, P., Ehresmann, C., Ehresmann, B., and Springer, M. (1992). The specificity of translational control switched with transfer RNA identity rules. Science 255, 994–996.

Gu, W., Zhou, T., and Wilke, C.O. (2010). A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput. Biol. 6, e1000664.

Hajnsdorf, E., and Boni, I.V. (2012). Multiple activities of RNA-binding proteins S1 and Hfq. Biochimie 94, 1544–1553.

Hammar, M., Arnqvist, A., Bian, Z., Olsén, A., and Normark, S. (1995). Expression of two csg operons is required for production of fibronectin- and congo red-binding curli polymers in Escherichia coli K-12. Mol. Microbiol. 18, 661–670.

Hartz, D., McPheeters, D.S., and Gold, L. (1989). Selection of the initiator tRNA by Escherichia coli initiation factors. Genes Dev. 3, 1899–1912.

Holmqvist, E., Reimegård, J., Sterk, M., Grantcharova, N., Römling, U., and Wagner, E.G.H. (2010). Two antisense RNAs target the transcriptional regulator CsgD to inhibit curli synthesis. EMBO J. 29, 1840–1850.

Holmqvist, E., Reimegård, J., and Wagner, E.G.H. (2013). Massive functional mapping of a 5’-UTR by saturation mutagenesis, phenotypic sorting and deep sequencing. Nucleic Acids Res. 41, e122.

Holmqvist, E., Wright, P.R., Li, L., Bischler, T., Barquist, L., Reinhardt, R., Backofen, R., and Vogel, J. (2016). Global RNA recognition patterns of post‐transcriptional regulators Hfq and CsrA revealed by UV crosslinking in vivo. EMBO J. 35, 991–1011.

Hook-Barnard, I.G., Brickman, T.J., and McIntosh, M.A. (2007). Identification of an AU-rich translational enhancer within the Escherichia coli fepB leader RNA. J. Bacteriol. 189, 4028–4037.

Hüttenhofer, A., and Noller, H.F. (1994). Footprinting mRNA-ribosome complexes with chemical probes. EMBO J. 13, 3892–3901.

Ingolia, N.T., Ghaemmaghami, S., Newman, J.R.S., and Weissman, J.S. (2009). Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223.

Jagodnik, J., Chiaruttini, C., and Guillier, M. (2017). Stem-Loop Structures within mRNA Coding Sequences Activate Translation Initiation and Mediate Control by Small Regulatory RNAs. Mol. Cell 68, 158–170.e3.

Kaberdina, A.C., Szaflarski, W., Nierhaus, K.H., and Moll, I. (2009). An unexpected type of ribosomes induced by kasugamycin: a look into ancestral times of protein synthesis? Mol. Cell 33, 227–236.

Kennell, D., and Riezman, H. (1977). Transcription and translation initiation frequencies of the Escherichia coli lac operon. J. Mol. Biol. 114, 1–21.

Kobayashi, K., Ehrlich, S.D., Albertini, A., Amati, G., Andersen, K.K., Arnaud, M., Asai, K., Ashikaga, S., Aymerich, S., Bessieres, P., et al. (2003). Essential Bacillus subtilis genes. Proc. Natl. Acad. Sci. U. S. A. 100, 4678–4683.

Kolb, A., Hermoso, J.M., Thomas, J.O., and Szer, W. (1977). Nucleic acid helix-unwinding properties of ribosomal protein S1 and the role of S1 in mRNA binding to ribosomes. Proc. Natl. Acad. Sci. U. S. A. 74, 2379–2383.

54

Komarova, A.V., Tchufistova, L.S., Supina, E.V., and Boni, I.V. (2002). Protein S1 counteracts the inhibitory effect of the extended Shine-Dalgarno sequence on translation. RNA N. Y. N 8, 1137–1147.

Komarova, A.V., Tchufistova, L.S., Dreyfus, M., and Boni, I.V. (2005). AU-rich sequences within 5’ untranslated leaders enhance translation and stabilize mRNA in Escherichia coli. J. Bacteriol. 187, 1344–1349.

Kozak, M. (2005). Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene 361, 13–37.

Kudla, G., Murray, A.W., Tollervey, D., and Plotkin, J.B. (2009). Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258.

La Teana, A., Pon, C.L., and Gualerzi, C.O. (1993). Translation of mRNAs with degenerate initiation triplet AUU displays high initiation factor 2 dependence and is subject to initiation factor 3 repression. Proc. Natl. Acad. Sci. U. S. A. 90, 4161–4165.

La Teana, A., Brandi, A., O’Connor, M., Freddi, S., and Pon, C.L. (2000). Translation during cold adaptation does not involve mRNA-rRNA base pairing through the downstream box. RNA N. Y. N 6, 1393–1402.

Li, G.-W., Oh, E., and Weissman, J.S. (2012). The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature 484, 538–541.

Li, G.-W., Burkhardt, D., Gross, C., and Weissman, J.S. (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635.

Ma, J., Campbell, A., and Karlin, S. (2002). Correlations between Shine-Dalgarno sequences and gene features such as predicted expression levels and operon structures. J. Bacteriol. 184, 5733–5745.

Majdalani, N., Cunning, C., Sledjeski, D., Elliott, T., and Gottesman, S. (1998). DsrA RNA regulates translation of RpoS message by an anti-antisense mechanism, independent of its action as an antisilencer of transcription. Proc. Natl. Acad. Sci. U. S. A. 95, 12462–12467.

Malmgren, C., Engdahl, H.M., Romby, P., and Wagner, E.G. (1996). An antisense/target RNA duplex or a strong intramolecular RNA structure 5’ of a translation initiation signal blocks ribosome binding: the case of plasmid R1. RNA N. Y. N 2, 1022–1032.

Martin-Farmer, J., and Janssen, G.R. (1999). A downstream CA repeat sequence increases translation from leadered and unleadered mRNA in Escherichia coli. Mol. Microbiol. 31, 1025–1038.

Marzi, S., Myasnikov, A.G., Serganov, A., Ehresmann, C., Romby, P., Yusupov, M., and Klaholz, B.P. (2007). Structured mRNAs regulate translation initiation by binding to the platform of the ribosome. Cell 130, 1019–1031.

Massé, E., and Gottesman, S. (2002). A small RNA regulates the expression of genes involved in iron metabolism in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 99, 4620–4625.

Mathy, N., Pellegrini, O., Serganov, A., Patel, D.J., Ehresmann, C., and Portier, C. (2004). Specific recognition of rpsO mRNA and 16S rRNA by Escherichia coli ribosomal protein S15 relies on both mimicry and site differentiation. Mol. Microbiol. 52, 661–675.

Mitarai, N., Sneppen, K., and Pedersen, S. (2008). Ribosome collisions and translation efficiency: optimization by codon usage and mRNA destabilization. J. Mol. Biol. 382, 236–245.

55

Moine, H., Romby, P., Springer, M., Grunberg-Manago, M., Ebel, J.P., Ehresmann, B., and Ehresmann, C. (1990). Escherichia coli threonyl-tRNA synthetase and tRNA(Thr) modulate the binding of the ribosome to the translational initiation site of the thrS mRNA. J. Mol. Biol. 216, 299–310.

Moll, I., Huber, M., Grill, S., Sairafi, P., Mueller, F., Brimacombe, R., Londei, P., and Blasi, U. (2001). Evidence against an Interaction between the mRNA Downstream Box and 16S rRNA in Translation Initiation. J. Bacteriol. 183, 3499–3505.

Moll, I., Grill, S., Gründling, A., and Bläsi, U. (2002). Effects of ribosomal proteins S1, S2 and the DeaD/CsdA DEAD-box helicase on translation of leaderless and canonical mRNAs in Escherichia coli. Mol. Microbiol. 44, 1387–1396.

Moll, I., Hirokawa, G., Kiel, M.C., Kaji, A., and Bläsi, U. (2004). Translation initiation with 70S ribosomes: an alternative pathway for leaderless mRNAs. Nucleic Acids Res. 32, 3354–3363.

Morita, M.T., Tanaka, Y., Kodama, T.S., Kyogoku, Y., Yanagi, H., and Yura, T. (1999). Translational induction of heat shock transcription factor sigma32: evidence for a built-in RNA thermosensor. Genes Dev. 13, 655–665.

Nordström, K., Wagner, E.G., Persson, C., Blomberg, P., and Ohman, M. (1988). Translational control by antisense RNA in control of plasmid replication. Gene 72, 237–240.

Osterman, I.A., Evfratov, S.A., Sergiev, P.V., and Dontsova, O.A. (2013). Comparison of mRNA features affecting translation initiation and reinitiation. Nucleic Acids Res. 41, 474–486.

Pechmann, S., and Frydman, J. (2013). Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding. Nat. Struct. Mol. Biol. 20, 237–243.

Philippe, C., Eyermann, F., Bénard, L., Portier, C., Ehresmann, B., and Ehresmann, C. (1993). Ribosomal protein S15 from Escherichia coli modulates its own translation by trapping the ribosome on the mRNA initiation loading site. Proc. Natl. Acad. Sci. U. S. A. 90, 4394–4398.

Philippe, C., Bénard, L., Eyermann, F., Cachia, C., Kirillov, S.V., Portier, C., Ehresmann, B., and Ehresmann, C. (1994). Structural elements of rps0 mRNA involved in the modulation of translational initiation and regulation of E. coli ribosomal protein S15. Nucleic Acids Res. 22, 2538–2546.

Qing, G., Xia, B., and Inouye, M. (2003). Enhancement of translation initiation by A/T-rich sequences downstream of the initiation codon in Escherichia coli. J. Mol. Microbiol. Biotechnol. 6, 133–144.

Qu, X., Lancaster, L., Noller, H.F., Bustamante, C., and Tinoco, I. (2012). Ribosomal protein S1 unwinds double-stranded RNA in multiple steps. Proc. Natl. Acad. Sci. U. S. A. 109, 14458–14463.

Quax, T.E.F., Wolf, Y.I., Koehorst, J.J., Wurtzel, O., van der Oost, R., Ran, W., Blombach, F., Makarova, K.S., Brouns, S.J.J., Forster, A.C., et al. (2013). Differential translation tunes uneven production of operon-encoded proteins. Cell Rep. 4, 938–944.

Quax, T.E.F., Claassens, N.J., Söll, D., and van der Oost, J. (2015). Codon Bias as a Means to Fine-Tune Gene Expression. Mol. Cell 59, 149–161.

Rajkowitsch, L., and Schroeder, R. (2007). Dissecting RNA chaperone activity. RNA N. Y. N 13, 2053–2060.

dos Reis, M., Savva, R., and Wernisch, L. (2004). Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 32, 5036–5044.

56

Ringquist, S., Shinedling, S., Barrick, D., Green, L., Binkley, J., Stormo, G.D., and Gold, L. (1992). Translation initiation in Escherichia coli: sequences within the ribosome-binding site. Mol. Microbiol. 6, 1219–1229.

Ringquist, S., Jones, T., Snyder, E.E., Gibson, T., Boni, I., and Gold, L. (1995). High-affinity RNA ligands to Escherichia coli ribosomes and ribosomal protein S1: comparison of natural and unnatural binding sites. Biochemistry (Mosc.) 34, 3640–3648.

Romby, P., Brunel, C., Caillet, J., Springer, M., Grunberg-Manago, M., Westhof, E., Ehresmann, C., and Ehresmann, B. (1992). Molecular mimicry in translational control of E. coli threonyl-tRNA synthetase gene. Competitive inhibition in tRNA aminoacylation and operator-repressor recognition switch using tRNA identity rules. Nucleic Acids Res. 20, 5633–5640.

Römling, U. (2005). Characterization of the rdar morphotype, a multicellular behaviour in Enterobacteriaceae. Cell. Mol. Life Sci. CMLS 62, 1234–1246.

Russell, J.B., and Cook, G.M. (1995). Energetics of bacterial growth: balance of anabolic and catabolic reactions. Microbiol. Rev. 59, 48–62.

Sacerdot, C., Fayat, G., Dessen, P., Springer, M., Plumbridge, J.A., Grunberg-Manago, M., and Blanquet, S. (1982). Sequence of a 1.26-kb DNA fragment containing the structural gene for E.coli initiation factor IF3: presence of an AUU initiator codon. EMBO J. 1, 311–315.

Sacerdot, C., Chiaruttini, C., Engst, K., Graffe, M., Milet, M., Mathy, N., Dondon, J., and Springer, M. (1996). The role of the AUU initiation codon in the negative feedback regulation of the gene for translation initiation factor IF3 in Escherichia coli. Mol. Microbiol. 21, 331–346.

Sacerdot, C., Caillet, J., Graffe, M., Eyermann, F., Ehresmann, B., Ehresmann, C., Springer, M., and Romby, P. (1998). The Escherichia coli threonyl-tRNA synthetase gene contains a split ribosomal binding site interrupted by a hairpin structure that is essential for autoregulation. Mol. Microbiol. 29, 1077–1090.

Salah, P., Bisaglia, M., Aliprandi, P., Uzan, M., Sizun, C., and Bontems, F. (2009). Probing the relationship between Gram-negative and Gram-positive S1 proteins by sequence analysis. Nucleic Acids Res. 37, 5578–5588.

Salis, H.M. (2011). The ribosome binding site calculator. Methods Enzymol. 498, 19–42.

Scharff, L.B., Childs, L., Walther, D., and Bock, R. (2011). Local absence of secondary structure permits translation of mRNAs that lack ribosome-binding sites. PLoS Genet. 7, e1002155.

Schlax, P.J., and Worhunsky, D.J. (2003). Translational repression mechanisms in prokaryotes. Mol. Microbiol. 48, 1157–1169.

Sengupta, J., Agrawal, R.K., and Frank, J. (2001). Visualization of protein S1 within the 30S ribosomal subunit and its interaction with messenger RNA. Proc. Natl. Acad. Sci. U. S. A. 98, 11991–11996.

Sharma, C.M., Darfeuille, F., Plantinga, T.H., and Vogel, J. (2007). A small RNA regulates multiple ABC transporter mRNAs by targeting C/A-rich elements inside and upstream of ribosome-binding sites. Genes Dev. 21, 2804–2817.

Sharma, C.M., Papenfort, K., Pernitzsch, S.R., Mollenkopf, H.-J., Hinton, J.C.D., and Vogel, J. (2011). Pervasive post-transcriptional control of genes involved in amino acid metabolism by the Hfq-dependent GcvB small RNA. Mol. Microbiol. 81, 1144–1165.

Sharp, P.M., and Li, W.H. (1987). The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295.

57

Shine, J., and Dalgarno, L. (1974). The 3’-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. U. S. A. 71, 1342–1346.

Skouv, J., Schnier, J., Rasmussen, M.D., Subramanian, A.R., and Pedersen, S. (1990). Ribosomal protein S1 of Escherichia coli is the effector for the regulation of its own synthesis. J. Biol. Chem. 265, 17044–17049.

de Smit, M.H., and van Duin, J. (1990). Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. Proc. Natl. Acad. Sci. U. S. A. 87, 7668–7672.

de Smit, M.H., and van Duin, J. (1993). Translational initiation at the coat-protein gene of phage MS2: native upstream RNA relieves inhibition by local secondary structure. Mol. Microbiol. 9, 1079–1088.

de Smit, M.H., and van Duin, J. (1994a). Translational initiation on structured messengers. Another role for the Shine-Dalgarno interaction. J. Mol. Biol. 235, 173–184.

de Smit, M.H., and van Duin, J. (1994b). Control of translation by mRNA secondary structure in Escherichia coli. A quantitative analysis of literature data. J. Mol. Biol. 244, 144–150.

de Smit, M.H., and van Duin, J. (2003a). Translational standby sites: how ribosomes may deal with the rapid folding kinetics of mRNA. J. Mol. Biol. 331, 737–743.

de Smit, M.H., and van Duin, J. (2003b). A Prelude to Translational (Re)Initiation. In Translation Mechanisms, (Landes Bioscience, Georgetown), pp. 298–321.

Sørensen, M.A., and Pedersen, S. (1991). Absolute in vivo translation rates of individual codons in Escherichia coli. The two glutamic acid codons GAA and GAG are translated with a threefold difference in rate. J. Mol. Biol. 222, 265–280.

Sørensen, M.A., Fricke, J., and Pedersen, S. (1998). Ribosomal protein S1 is required for translation of most, if not all, natural mRNAs in Escherichia coli in vivo. J. Mol. Biol. 280, 561–569.

Sorokin, A., Serror, P., Pujic, P., Azevedo, V., and Ehrlich, S.D. (1995). The Bacillus subtilis chromosome region encoding homologues of the Escherichia coli mssA and rpsA gene products. Microbiol. Read. Engl. 141 ( Pt 2), 311–319.

Sprengart, M.L., Fatscher, H.P., and Fuchs, E. (1990). The initiation of translation in E.coli : apparent base pairing between the 16srRNA and downstream sequences of the mRNA. Nucleic Acids Res. 18, 1719–1723.

Springer, M., Plumbridge, J.A., Butler, J.S., Graffe, M., Dondon, J., Mayaux, J.F., Fayat, G., Lestienne, P., Blanquet, S., and Grunberg-Manago, M. (1985). Autogenous control of Escherichia coli threonyl-tRNA synthetase expression in vivo. J. Mol. Biol. 185, 93–104.

Steitz, J.A., and Jakes, K. (1975). How ribosomes select initiator regions in mRNA: base pair formation between the 3’ terminus of 16S rRNA and the mRNA during initiation of protein synthesis in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 72, 4734–4738.

Sterk, M., Romilly, C., and Wagner, E.G.H. (2018). Unstructured 5’-tails act through ribosome standby to override inhibitory structure at ribosome binding sites. Nucleic Acids Res.

Storz, G., Vogel, J., and Wassarman, K.M. (2011). Regulation by small RNAs in bacteria: expanding frontiers. Mol. Cell 43, 880–891.

Studer, S.M., and Joseph, S. (2006). Unfolding of mRNA secondary structure by the bacterial translation initiation complex. Mol. Cell 22, 105–115.

Subramanian, A.R. (1983). Structure and functions of ribosomal protein S1. Prog. Nucleic Acid Res. Mol. Biol. 28, 101–142.

58

Subramanian, A.R., and van Duin, J. (1977). Exchange of individual ribosomal proteins between ribosomes as studied by heavy isotope-transfer experiments. Mol. Gen. Genet. MGG 158, 1–9.

Supek, F., and Šmuc, T. (2010). On relevance of codon usage to expression of syn-thetic and natural genes in Escherichia coli. Genetics 185, 1129–1134.

Tchufistova, L.S., Komarova, A.V., and Boni, I.V. (2003). A key role for the mRNA leader structure in translational control of ribosomal protein S1 synthesis in gamma-proteobacteria. Nucleic Acids Res. 31, 6996–7002.

Tedin, K., Resch, A., and Bläsi, U. (1997). Requirements for ribosomal protein S1 for translation initiation of mRNAs with and without a 5’ leader sequence. Mol. Microbiol. 25, 189–199.

Thomas, J.O., Kolb, A., and Szer, W. (1978). Structure of single-stranded nucleic acids in the presence of ribosomal protein S1. J. Mol. Biol. 123, 163–176.

Tucker, B.J., and Breaker, R.R. (2005). Riboswitches as versatile gene control ele-ments. Curr. Opin. Struct. Biol. 15, 342–348.

Tuller, T., and Zur, H. (2015). Multiple roles of the coding sequence 5’ end in gene expression regulation. Nucleic Acids Res. 43, 13–28.

Tuller, T., Carmi, A., Vestsigian, K., Navon, S., Dorfan, Y., Zaborske, J., Pan, T., Dahan, O., Furman, I., and Pilpel, Y. (2010). An evolutionarily conserved mecha-nism for controlling the efficiency of protein translation. Cell 141, 344–354.

Udagawa, T., Shimizu, Y., and Ueda, T. (2004). Evidence for the translation initia-tion of leaderless mRNAs by the intact 70 S ribosome without its dissociation into subunits in eubacteria. J. Biol. Chem. 279, 8539–8546.

Unoson, C., and Wagner, E.G.H. (2008). A small SOS-induced toxin is targeted against the inner membrane in Escherichia coli. Mol. Microbiol. 70, 258–270.

Urban, J.H., and Vogel, J. (2007). Translational control and target recognition by Escherichia coli small RNAs in vivo. Nucleic Acids Res. 35, 1018–1037.

Vesper, O., Amitai, S., Belitsky, M., Byrgazov, K., Kaberdina, A.C., Engelberg-Kulka, H., and Moll, I. (2011). Selective translation of leaderless mRNAs by spe-cialized ribosomes generated by MazF in Escherichia coli. Cell 147, 147–157.

Vimberg, V., Tats, A., Remm, M., and Tenson, T. (2007). Translation initiation region sequence preferences in Escherichia coli. BMC Mol. Biol. 8, 100.

Vogel, J., and Luisi, B.F. (2011). Hfq and its constellation of RNA. Nat. Rev. Mi-crobiol. 9, 578–589.

Vogel, J., Argaman, L., Wagner, E.G.H., and Altuvia, S. (2004). The small RNA IstR inhibits synthesis of an SOS-induced toxic peptide. Curr. Biol. CB 14, 2271–2276.

Wagner, E.G.H., and Romby, P. (2015). Small RNAs in bacteria and archaea: who they are, what they do, and how they do it. Adv. Genet. 90, 133–208.

Yamamoto, H., Wittek, D., Gupta, R., Qin, B., Ueda, T., Krause, R., Yamamoto, K., Albrecht, R., Pech, M., and Nierhaus, K.H. (2016). 70S-scanning initiation is a novel and frequent initiation mode of ribosomal translation in bacteria. Proc. Natl. Acad. Sci. U. S. A. 113, E1180-1189.

Yang, Q., Figueroa-Bossi, N., and Bossi, L. (2014). Translation enhancing ACA motifs and their silencing by a bacterial small regulatory RNA. PLoS Genet. 10, e1004026.

Yusupova, G.Z., Yusupov, M.M., Cate, J.H., and Noller, H.F. (2001). The path of messenger RNA through the ribosome. Cell 106, 233–241.

Zhang, G., Hubalewska, M., and Ignatova, Z. (2009). Transient ribosomal attenua-tion coordinates protein synthesis and co-translational folding. Nat. Struct. Mol. Biol. 16, 274–280.

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 1636

Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science andTechnology, Uppsala University, is usually a summary of anumber of papers. A few copies of the complete dissertationare kept at major Swedish research libraries, while thesummary alone is distributed internationally throughthe series Digital Comprehensive Summaries of UppsalaDissertations from the Faculty of Science and Technology.(Prior to January, 2005, the series was published under thetitle “Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology”.)

Distribution: publications.uu.seurn:nbn:se:uu:diva-343082

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2018