19
20 Biomolecular chemistry 2. RNA and transcription Primary Source Material •Biochemistry Berg, Jeremy M.; Tymoczko, John L.; and Stryer, Lubert (courtesy of the NCBI bookshelf) •Molecular Cell Biology Lodish, Harvey; Berk, Arnold; Zipursky, S. Lawrence; Matsudaira, Paul; Baltimore, David; Darnell, James E. (courtesy of the NCBI bookshelf) •Many figures and the descriptions for the figures are from the educational resources provided at the Protein Data Bank (http://www.pdb.org/) •Most of these figures and accompanying legends have been written by David S. Goodsell of the Scripps Research Institute and are being used with permission. I highly recommend browsing the Molecule of the Month series at the PDB (http://www.pdb.org/pdb/101/motm_archive.do) Some objectives for this section: • you will appreciate the many roles of RNA • you will understand the mechanism of RNA polymerase • you will know some differences between prokaryote and eukaryote RNA processing • you will know what an exon and an intron is • you will appreciate the structures that RNA can adapt • you will know what reverse transcriptase does

2. RNA and transcription - ualberta.cacampbell/resources/Bioanalytical...20 Biomolecular chemistry 2. RNA and transcription Primary Source Material •Biochemistry Berg, Jeremy M.;

  • Upload
    buidang

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

20

Biomolecular chemistry

2. RNA and transcription

Primary Source Material•Biochemistry Berg, Jeremy M.; Tymoczko, John L.; and Stryer, Lubert (courtesy of the NCBI bookshelf) •Molecular Cell Biology Lodish, Harvey; Berk, Arnold; Zipursky, S. Lawrence; Matsudaira, Paul; Baltimore, David; Darnell, James E. (courtesy of the NCBI bookshelf)•Many figures and the descriptions for the figures are from the educational resources provided at the Protein Data Bank (http://www.pdb.org/)•Most of these figures and accompanying legends have been written by David S. Goodsell of the Scripps Research Institute and are being used with permission. I highly recommend browsing the Molecule of the Month series at the PDB (http://www.pdb.org/pdb/101/motm_archive.do)

Some objectives for this section:• you will appreciate the many roles of RNA• you will understand the mechanism of RNA polymerase• you will know some differences between prokaryote and eukaryote RNA processing• you will know what an exon and an intron is• you will appreciate the structures that RNA can adapt• you will know what reverse transcriptase does

The Central Dogma

U.S. Department of Energy Human Genome Program (http://www.ornl.gov/hgmis)

21

information

• There are many ways of stating the central dogma of molecular biology. Apparently Francis Crick originally defined it like this:

The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that information cannot be transferred back from protein to either protein or nucleic acid. (http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology)

• That way that I think of the Central Dogma is: genetic information tends to flow from DNA to RNA to proteins.

• The information stored as DNA becomes useful through gene expression.• Gene expression means the production of a protein or a functional RNA from its gene. • Gene expression involves several steps:

• Transcription:  A DNA strand is used as the template to synthesize a RNA strand, which is called the primary transcript. 

• RNA processing:  This step involves modifications of the primary transcript to generate a mature mRNA (for protein genes) or a functional tRNA or rRNA. For RNA genes (tRNA and rRNA), the expression is complete after a functional tRNA or rRNA is generated.  However, protein genes require additional steps.

• Nuclear transport: mRNA has to be transported from the nucleus to the cytoplasm for protein synthesis• Protein synthesis:  In the cytoplasm, mRNA binds to ribosomes, which can synthesize a polypeptide based on the

sequence of mRNA.

• Epigenetic information can be thought of as flowing the other way: changes in the cell (typically in proteins or caused by proteins) that result in changes in gene expression but not in changes in the genetic sequence itself.

The roles of RNA: more than just messengers

rRNA

And one more… catalytic RNA

22

• Messenger RNA is the template for protein synthesis (translation). An mRNA molecule may be produced for each gene or group of genes that is to be expressed in E. coli, whereas a distinct mRNA is produced for each gene in eukaryotes. In E. coli, the average length of an mRNA molecule is about 1.2 kilobases (kb).

• Transfer RNA carries amino acids in an activated form to the ribosome for peptide-bond formation, in a sequence dictated by the mRNA template. There is at least one kind of tRNA for each of the 20 amino acids. Transfer RNA consists of about 75 nucleotides (having a mass of about 25 kDa), which makes it one of the smallest of the RNA molecules discussed here.

• Ribosomal RNA (rRNA), the major component of ribosomes, plays both a catalytic and a structural role in protein synthesis. In E. coli, there are three kinds of rRNA, called 23S, 16S, and 5S RNA because of their sedimentation behaviour. One molecule of each of these species of rRNA is present in each ribosome.

• The first catalytic RNA was discovered by Cech and coworkers in the early 1980’s. Most naturally occurring ribozymes have a role in mRNA splicing. However, in vitro evolution has resulted in ribozymes with a variety of different functions.

DNA to RNA (transcription)

David S. Goodsell: The Molecule of the Month appearing at the PDB

The 2006 Nobel Prize in Chemistry was awarded to Roger Kornberg for his work in determining the mechanism of RNA polymerase, including solving this crystal structure.

23

• RNA synthesis (transcription), is the process of transcribing DNA nucleotide sequence information into RNA sequence information. RNA synthesis is catalyzed by a large enzyme called RNA polymerase. The basic biochemistry of RNA synthesis is common to prokaryotes and eukaryotes, although its regulation is more complex in eukaryotes. Despite substantial differences in size and number of polypeptide subunits, the overall structures of these enzymes are quite similar between prokaryotes and eukaryotes, revealing a common evolutionary origin.

• RNA synthesis takes place in three stages: initiation, elongation, and termination. RNA polymerase performs multiple functions in this process:

• It searches DNA for initiation sites, also called promoter sites or simply promoters. • It unwinds a short stretch of double-helical DNA to produce a single-stranded DNA template from which it will ‘read’ the sequence.• It selects the correct ribonucleoside triphosphate and catalyzes the formation of a phosphodiester bond. RNA polymerase is

completely processive - a transcript is synthesized from start to end by a single RNA polymerase molecule.• It detects termination signals that specify where a transcript ends.• It interacts with activator and repressor proteins that modulate the rate of transcription initiation over a wide dynamic range. These

proteins, which play a more prominent role in eukaryotes than in prokaryotes, are called transcription factors. RNA polymerase is a huge factory with many moving parts. The one shown here, from PDB entry 1i6h, is from yeast (Saccharomyces cerevisiae). It is composed of a dozen different proteins. Together, they form a machine that surrounds DNA strands, unwinds them, and builds an RNA strand based on the sequence of the DNA. Once the enzyme gets started, RNA polymerase continues along the DNA copying RNA strands thousands of nucleotides long.

• In contrast with DNA synthesis, RNA synthesis can start de novo, without the requirement for a primer. • Most newly synthesized RNA chains carry a highly distinctive tag on the 5’ end: the first base at that end is either pppG or pppA.• RNA chains, like DNA chains, grow in the 5’-3’ direction.• The Nobel Prize in Chemistry for 2006 went to Roger D. Kornberg of Stanford University, CA, USA "for his studies of the molecular

basis of eukaryotic transcription" (http://nobelprize.org/nobel_prizes/chemistry/laureates/2006/index.html)

The transcription bubble 24

David S. Goodsell: The Molecule of the Month appearing at the PDB

• The region containing RNA polymerase, DNA, and newly synthesized RNA is called a transcription bubble because it contains a locally melted “bubble” of DNA. The newly synthesized RNA forms a hybrid helix with the template DNA strand. This RNA-DNA helix is about 8 bp long, which corresponds to nearly one turn of a double helix (10.4 bp/turn in B-form).

• The 3’ hydroxyl group of the RNA in this hybrid helix is positioned so that it can attack the alpha-phosphate atom of an incoming ribonucleoside triphosphate. The core enzyme also contains a binding site for the other DNA strand. About 17 bp of DNA are unwound throughout the elongation phase, as in the initiation phase. The transcription bubble moves a distance of 170 Å (17 nm) in a second, which corresponds to a rate of elongation of about 50 nucleotides per second. Although rapid, it is much slower than the rate of DNA synthesis, which is about 800 nucleotides per second.

• As you might expect, RNA polymerase needs to be accurate in its copying of genetic information. To improve its accuracy, it performs a simple proofreading step as it builds an RNA strand. The active site is designed to be able to remove nucleotides as well as add them to the growing strand. The enzyme tends to hover around mismatched nucleotides longer than properly added ones, giving the enzyme time to remove them. This process is somewhat wasteful, since proper nucleotides are also occasionally removed, but this is a small price to pay for creating better RNA transcripts. Overall, RNA polymerase makes an error about once in 10,000 nucleotides added, or about once per RNA strand created. This rate is about 104 fold higher than DNA synthesis. The much lower fidelity of RNA synthesis can be tolerated because mistakes are not transmitted to progeny. For most genes, many RNA transcripts are synthesized; a few defective transcripts are unlikely to be harmful.

• PDB entry 1msw reveals the structure of a very small RNA polymerase that is made by the bacteriophage T7, shown here with blue tubes. A small transcription bubble, composed of two DNA strands and an RNA strand, is bound in the active site. Notice how the two DNA strands form a double helix at the top of the picture. The enzyme separates them in the middle and builds an RNA strand using the DNA on the right. Finally, at the bottom, the two DNA strands come back together.

• This structure was not determined by Roger Kornberg, but rather Tom Steitz, a famous x-ray crystallographer. Professor Steitz was awarded the The Nobel Prize in Chemistry 2009 "for studies of the structure and function of the ribosome"(http://nobelprize.org/nobel_prizes/chemistry/laureates/2009/index.html)

• Question: What does this mean "many RNA transcripts are synthesized"?• Answer: This statement refers to the fact that there is not one mRNA for each gene. When a gene is being expressed, it implies that there are

many RNA polymerases are copying it and making many mRNA molecules. If a couple of them have a mistake, it is probably not a big deal.

Transcription is a highly regulated process 25

lactose presentlactose absent

For transcription to occur, lactose must bind to the lac repressor. This binding changes the conformation of the protein such that it

can no longer bind to the operator site and interfere with the function of RNA polymerase

Example: the lac operon

• With only a few exceptions, every cell of the body contains a full set of chromosomes and identical genes. Only a fraction of these genes are turned on at any one time, however, and it is the subset that is "expressed" that confers unique properties to each cell type.

• "Gene expression" is the term used to describe the transcription of the information contained within the DNA, the repository of genetic information, into messenger RNA (mRNA) molecules that are then translated into the proteins that perform most of the critical functions of cells.

• Biologists study the kinds and amounts of mRNA produced by a cell to learn which genes are expressed, which in turn provides insights into how the cell responds to its changing needs.

• Gene expression is a highly complex and tightly regulated process that allows a cell to respond dynamically both to environmental stimuli and to its own changing needs. This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary.

• The lac operon shown in this movie is one of the simplest gene regulation mechanisms, but it is actually a bit more complicated than the extremely simplified version shown here.

The mechanisms of DNA and RNA elongation are similar

Active site of DNA polymerase Active site of RNA polymerase

26

• The catalytic site of RNA polymerase resembles that of DNA polymerase in that it includes two metal ions in its active form. One metal ion remains bound to the enzyme, whereas the other appears to come in with the nucleoside triphosphate and leave with the pyrophosphate. Three conserved aspartate residues of the enzyme participate in binding these metal ions. Note that the overall structures of DNA polymerase and RNA polymerase are quite different; their similar active sites are the products of convergent evolution.

Transcription is much more complex in eukaryotes than in prokaryotes

27

• In prokaryotes (bacterial and archaeal cells defined by the fact that they lack a nucleus), translation of mRNA begins while the transcript is still being synthesized.

• In eukaryotes (animal, plant, and fungi cells defined by the fact that they have a nucleus), transcription and translation take place in different cellular compartments: transcription takes place in the membrane-bounded nucleus, whereas translation takes place outside the nucleus in the cytoplasm.

• A second major difference between prokaryotes and eukaryotes is the extent of RNA processing. Eukaryotes extensively process nascent pre-mRNA destined to become mature mRNA. Primary transcripts (pre-mRNA molecules), the products of RNA polymerase action, acquire a cap at their 5’ ends and a poly(A) tail at their 3’ ends. Most importantly, nearly all mRNA precursors in higher eukaryotes are spliced.

• primary transcript: Initial RNA product, containing introns and exons, produced by transcription of DNA. Many primary transcripts must undergo RNA processing to form the physiologically active RNA species.

• Question: In many pictures, it only shows the mRNA. How about tRNA and rRNA, are they also go to the transcription and processing processes?

• Answer: tRNA and rRNA are encoded in the genome and are synthesized by RNA polymerases just like mRNA is. They will also undergo processing, but it is different than the processing that occurs for mRNA.

Many ribosomes can translate a single mRNA

simultaneously

28

• The sequence of amino acids in a protein is translated from the nucleotide sequence in mRNA. In which direction is the message read? The direction of translation is 5’ to 3’ in terms of the reading of the mRNA template. This corresponds to synthesis from the N-to-C terminus in terms of the protein product.

• The direction of translation has important consequences. Recall that transcription also occurs in the 5’-3’ direction. If the direction of translation were opposite that of transcription, only fully synthesized mRNA could be translated.

• In contrast, because the directions are the same, mRNA can be translated while it is being synthesized. In prokaryotes, almost no time is lost between transcription and translation. The 5’ end of mRNA interacts with ribosomes very soon after it is made, much before the 3’ end of the mRNA molecule is finished.

• An important feature of prokaryotic gene expression is that translation and transcription are closely coupled in space and time. Many ribosomes can be translating an mRNA molecule simultaneously. This parallel synthesis markedly increases the efficiency of mRNA translation.

Mature eukaryotic vs. prokaryotic mRNA

Eukaryotic mRNA

29

• The 5' cap (also called an RNA cap, an RNA 7-methylguanosine cap or an RNA m7G cap) is a modified guanine nucleotide that has been added to the 5' end of the messenger RNA shortly after the start of transcription. The 5' cap consists of a terminal 7-methylguanosine residue which is linked through a 5'-5'-triphosphate bond to the first transcribed nucleotide. Its presence is critical for recognition by the ribosome and protection from RNases.

• Coding regions are composed of codons, which are decoded and translated into protein by the ribosome. Coding regions begin with the start codon and end with the one of three possible stop codons. In addition to their protein-coding role, portions of coding regions may also serve as regulatory sequences.

• Untranslated regions (UTRs) are sections of the RNA before the start codon and after the stop codon that are not translated, termed the five-prime untranslated region (5' UTR) and three-prime untranslated region (3' UTR), respectively. These regions are transcribed as part of the same transcript as the coding region. Several roles in gene expression have been attributed to the untranslated regions, including mRNA stability, mRNA localization, and translational efficiency. The ability of a UTR to perform these functions depends on the sequence of the UTR and can differ between mRNAs.

• The 3' poly(A) tail is a long sequence (often several hundred) of adenine nucleotides added to the "tail" (3' end) of the pre-mRNA.

• From http://en.wikipedia.org/wiki/MRNA

Splicing of mammalian mRNA: Introns and Exons

30

The primary transcript is ‘spliced’ to form the correct reading sequence of the gene

• Intron: Part of a primary transcript (or the DNA encoding it) that is removed by splicing during RNA processing and is not included in the mature, functional mRNA, rRNA, or tRNA; also called intervening sequence.

• Exon: Segments of a eukaryotic gene (or of its primary transcript) that reaches the cytoplasm as part of a mature mRNA, rRNA, or tRNA molecule.

• Introns are precisely excised from primary transcripts, and exons are joined to form mature mRNAs with continuous messages. Alternative splicing enlarges the repertoire of proteins in eukaryotes and is a clear illustration of why the proteome is more complex than the genome.

• Right hand figure and following legend from Nature Reviews Genetics 5, 389-396 (May 2004) “There are several conserved motifs in the nucleotide sequences near the intron–exon boundaries that act as essential splicing signals: GU and AG dinucleotides at the exon–intron and intron–exon junctions, respectively (5'- and 3'-splice sites), a polypyrimidine tract (Py)n and an A nucleotide at the branch site. Splicing takes places in two transesterification steps. In the first step, the 2'-hydroxyl group of the A residue at the branch site attacks the phosphate at the GU 5'-splice site. This leads to cleavage of the 5' exon from the intron and the formation of a lariat intermediate. In the following step, a second transesterification reaction, which involves the phosphate (p) at the 3' end of the intron and the 3'-hydroxyl group of the detached exon, ligates the two exons. This reaction releases the intron, still in the form of a lariat.”

HIV: reverse transcriptase is essential 31

see ‘HIV live cycle’ animation on webpage

• Retroviruses: these viruses can reverse the flow of genetic information (RNA to DNA rather than from DNA to RNA)! The most famous retrovirus is human immunodeficiency virus 1 (HIV-1), the cause of AIDS. Retroviruses have two identical copies of a single-stranded RNA genome and an outer envelope containing protruding viral glycoproteins.

0. The retroviral envelope fuses directly with the plasma membrane (step 1). 1. Following fusion, the nucleocapsid enters the cytoplasm of the cell; then deoxynucleoside

triphosphates from the cytosol enter the nucleocapsid, where viral reverse transcriptase and other proteins copy the ssRNA genome of the virus into a dsDNA copy (step 2).

2. The viral DNA copy is transported into the nucleus (only one host-cell chromosome is depicted) and integrated into one of many possible sites in the host-cell chromosomal DNA (step 3).

3. The integrated viral DNA, referred to as a provirus, is transcribed by the host-cell RNA polymerase, generating mRNAs (dark red) and genomic RNA molecules (light red). The host-cell machinery translates the viral mRNAs into glycoproteins and nucleocapsid proteins (step 4).

4. The latter assemble with genomic RNA to form progeny nucleocapsids, which interact with the membrane-bound viral glycoproteins. Eventually the host-cell membrane buds out and progeny virions are pinched off (step 5).

David S. Goodsell: The Molecule of the Month appearing at the PDB

HIV: reverse transcriptase

+ primer + primer

32

• Reverse transcriptase performs several different functions. As indicated by the name, it can build DNA strands based on an RNA template. This reaction is performed in the polymerase active site, which is formed by two sets of arms that surround the RNA and DNA. The polymerase site is at the top in this illustration, taken from PDB entry 2hmi. After building the DNA strand, the enzyme then removes the original RNA strand by cleaving it into pieces. This is performed by a nuclease active site, which is located at the opposite end of the enzyme. Finally, it builds a second DNA strand matched to the one that was just created to form the final DNA double helix. This reaction is also performed by the polymerase site.• Reverse transcriptase performs a remarkable feat, reversing the normal flow of genetic information,

but it is rather sloppy in its job. The polymerases used to make DNA and RNA in cells are very accurate and make very few mistakes. This is essential because they are the caretakers of our genetic information, and mistakes may be passed on to our offspring. Reverse transcriptase, on the other hand, makes lots of mistakes, up to about one in every 2,000 bases that it copies. This high error rate turns out to be an advantage for the virus in this era of drug treatment. The errors allow HIV to mutate rapidly, finding drug resistant strains in a matter of weeks after treatment begins. Fortunately, the recent development of treatments that combine several drugs are often effective in combating this problem. Since the virus is simultaneously attacked by several different drugs, it cannot mutate to evade all of them at the same time.

RNA can adapt well-defined tertiary structures

http://prion.bchs.uh.edu/bp_type/bp_structure.html

33

• Unlike DNA, which exists primarily in a single, very long three-dimensional structure, the double helix, the various types of RNA exhibit different conformations. Differences in the sizes and conformations of the various types of RNA permit them to carry out specific functions in a cell.

• The simplest secondary structures in single-stranded RNAs are formed by pairing of complementary bases. “Hairpins” are formed by pairing of bases within ~ 5 to 10 nucleotides of each other, and “stem-loops” by pairing of bases that are separated by ~50 to several hundred nucleotides. These simple folds can cooperate to form more complicated tertiary structures, one of which is termed a “pseudoknot”. As discussed on the next page, tRNA molecules adopt a well-defined three-dimensional architecture in solution that is crucial in protein synthesis.

• Stem-loops, hairpins, and other secondary structures can form by base pairing between distant complementary segments of an RNA molecule. In stem-loops, the single-stranded loop (dark red) between the base-paired helical stem (light red) may be hundreds or even thousands of nucleotides long, whereas in hairpins, the short turn may contain as few as 6 – 8 nucleotides.

• Interactions between the flexible loops may result in further folding to form tertiary structures such as the pseudoknot. This tertiary structure resembles a figure-eight knot, but the free ends do not pass through the loops, so no knot is actually formed.

RNA can adapt well-defined tertiary structures

http://ndbserver.rutgers.edu/atlas/xray/structures/T/tr0001/tr0001.html

>Yeast phenyalanine tRNAGCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA

primary (1°)

secondary (2°) tertiary (3°)

34

• Transfer RNA (abbreviated tRNA), is a small RNA chain (73-93 nucleotides) that transfers a specific amino acid to a growing polypeptide chain at the ribosomal site of protein synthesis during translation. It has a 3' terminal site for amino acid attachment. This covalent linkage is catalyzed by an aminoacyl tRNA synthetase. It also contains a three base region called the anticodon that can base pair to the corresponding three base codon region on mRNA. Each type of tRNA molecule can be attached to only one type of amino acid, but because the genetic code contains multiple codons that specify the same amino acid, tRNA molecules bearing different anticodons may also carry the same amino acid.

• http://en.wikipedia.org/wiki/Transfer_RNA

35

• mFold is a tool that enables the prediction of DNA or RNA secondary structure.• It has been in operation since 1995, making it one of the oldest bioinformatics tools on the web.• http://mfold.rna.albany.edu/?q=mfold/RNA-Folding-Form• mFold was developed primarily by Dr. Michael Zuker, now at the Rensselaer Polytechnic Institute,

while he was affiliated with the NRCC and later with Washington University, in St. Louis.

mFold does a fairly good job of predicting tRNA 2° structure

Rotate and flip

36

• These are the results obtained when I submitted the yeast phenylalanine sequence to the mfold server

• Yeast phenyalanine tRNAGCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA

• The predicted structures are practically identical to the known structure that has been experimentally determined and verified using multiple techniques.

But the true structure is not always the one predicted to have the lowest energy

Human Phenylalanine tRNAGCCGAAAUAGCUCAGUUGGGAGAGCGUUAGACUGAAGAUC

UAAAGGUCCCUGGUUCGAUCCCGGGUUUCGGCA

dG = -30.1 dG = -29.2 dG = -29.0 dG = -28.3

37

• Try it yourself using the human Phe tRNA sequence. This sequence is available on the website as .txt.

• Q. What factor is mFold not taking into account that could explain the difference between theoretical and experimental 2° structures?

• A. The tertiary structure. Their could be contacts in 3 dimensions between regions that are distant in primary and secondary structure. The contacts could provide additional stabilization to one particular arrangement of secondary structural elements.

Summary of RNA and Transcription

•RNA polymerase synthesis RNA from a DNA template •RNA polymerase locally unwinds the double stranded DNA to make a ‘transcription bubble’•The catalytic mechanisms of RNA polymerase and DNA polymerase are very similar•RNA has 3 main roles in proteins synthesis but this will be discussed in more detail next class•RNA production is a highly regulated process• In eukaryotes, mRNA is initially produced as a series of exons and introns. Splicing out of the introns, plus further modifications, provides the mature mRNA.•RNA can adapt well defined tertiary structures. Software is getting pretty good at predicting these structures.•Reverse transcriptase goes against the standard ‘flow of information’: it makes DNA from RNA

38