Genes and Genomes 6. 6 Genes and Genomes The Structure of Eukaryotic Genes Noncoding Sequences...

Preview:

DESCRIPTION

Introduction Understanding gene structure and function is fundamental to understanding the molecular biology of cells. Advances in DNA sequencing have brought us to the exciting point of knowing the complete genome sequences of hundreds of bacteria, of yeast, and many species of plants and animals.

Citation preview

Genes and Genomes

6

6 Genes and Genomes

• The Structure of Eukaryotic Genes

• Noncoding Sequences

• Chromosomes and Chromatin

Introduction

Understanding gene structure and function is fundamental to understanding the molecular biology of cells.

Advances in DNA sequencing have brought us to the exciting point of knowing the complete genome sequences of hundreds of bacteria, of yeast, and many species of plants and animals.

The Structure of Eukaryotic Genes

Most eukaryotes have larger and more complex genomes than prokaryotes.

However genome size does not appear to be related to genetic complexity.

Most eukaryotic genomes contain large amounts of noncoding DNA.

Figure 6.1 Genome size

The Structure of Eukaryotic Genes

Differences in sizes of eukaryotic genomes primarily reflects differences in amounts of noncoding DNA.

Noncoding sequences play roles in regulation of gene expression, and expanding coding potential by allowing genes to be expressed in alternate ways.

The Structure of Eukaryotic Genes

A gene is a segment of DNA that is expressed to yield a functional product (e.g., rRNA, tRNA, or polypeptide).

Noncoding sequences occur both within and between eukaryotic genes.

The Structure of Eukaryotic Genes

Coding sequences within genes (exons) are separated by noncoding sequences (introns).

The entire gene is transcribed to RNA and the introns are then removed by splicing, so only exons are included in the mRNA.

Figure 6.2 The structure of eukaryotic genes

The Structure of Eukaryotic Genes

Introns were discovered in 1977, during studies of adenovirus in cultured human cells.

mRNA-DNA hybrids, which are visible using electron microscopy, showed that a single mRNA molecule hybridized to several separated regions of the viral genome.

Key Experiment, Ch. 6, p. 190 (3)

The Structure of Eukaryotic Genes

The adenovirus mRNA does not correspond to an uninterrupted transcript of the template DNA.

mRNA is assembled from blocks of sequences from different parts of the viral DNA by RNA splicing.

Figure 6.3 Identification of introns in adenovirus mRNA (Part 1)

Figure 6.3 Identification of introns in adenovirus mRNA (Part 2)

The Structure of Eukaryotic Genes

Similar observations were soon made on cloned eukaryotic genes.

Sequencing of cloned DNAs and cDNAs indicated that the coding region of the mouse β-globin gene is interrupted by two introns that are removed from the mRNA by splicing.

Figure 6.4 The mouse b-globin gene

The Structure of Eukaryotic Genes

The amount of DNA in introns can be greater than that in exons.

An average human gene has 10 exons distributed over 56,000 base pairs (56 kilobases, or kb).

The Structure of Eukaryotic Genes

Exons include regions at both ends of the mRNA that are not translated into protein (3′ and 5′ untranslated regions or UTRs).

Introns make up more than 90% of the average human gene.

Table 6.1 Characteristics of the Average Human Gene

The Structure of Eukaryotic Genes

Introns are present in most, but not all, eukaryote genes.

Almost all histone genes lack introns.

Introns are rare in prokaryote genes, and genes of simple eukaryotes, such as yeasts.

The Structure of Eukaryotic Genes

Many introns are conserved in genes of both plants and animals, indicating they arose early in evolution, prior to plant-animal divergence.

In eukaryotes, the frequency of introns increases with genomic size and complexity.

Table 6.2 Introns in Representative Genomes

The Structure of Eukaryotic Genes

Many introns encode functional products—proteins or noncoding RNAs.

One gene is contained within an intron of a larger gene (nested genes).

Some nested genes encode pseudogenes or microRNAs and small nucleolar RNAs.

Figure 6.5 Nested genes

The Structure of Eukaryotic Genes

Some introns contain regulatory sequences that control gene expression.

All cells in an organism have the same genes; which genes are expressed determines differences in cell type and function.

The Structure of Eukaryotic Genes

Regulatory sequences that control transcription may be located upstream of a gene, within a gene, or at distant locations in the genome.

Figure 6.6 Transcriptional regulatory elements

The Structure of Eukaryotic Genes

Introns also allow the exons of a gene to be joined in different combinations, resulting in different proteins from the same gene: alternative splicing.

On average, each human gene yields six alternatively spliced mRNAs.

Regulatory sequences within introns also regulate splicing.

Figure 6.7 Alternative splicing

Noncoding Sequences

Eukaryote genomes have several types of sequences in addition to those coding for proteins.

Many are important in gene regulation; some are involved in structure and replication of chromosomes.

Understanding the function of noncoding sequences will be key to understanding development and behavior.

Noncoding Sequences

The ENCODE project analyzed 147 human cell lines with the goal of defining the functions of different sequence types.

About 75% of the human genome is transcribed, leading to the realization that noncoding RNAs have a big role in gene regulation.

Key Experiment, Ch. 6, p. 195

Noncoding Sequences

RNA interference (RNAi) mediated by short double-stranded RNAs is used to block gene expression at the level of translation.

RNAi is also normally used by cells to control mRNA translation and degradation.

Noncoding Sequences

MicroRNAs (miRNAs) mediate RNA interference.

The precursors of miRNAs are longer RNAs that fold into hairpin structures, and are then cleaved by nucleases Drosha and Dicer to yield double-stranded RNAs of about 22 nucleotides.

Figure 6.8 miRNAs (Part 1)

Figure 6.8 miRNAs (Part 2)

Noncoding Sequences

miRNAs associate with the RNA-induced silencing complex (RISC).

The miRNA targets RISC to complementary mRNAs, where they inhibit translation and stimulate mRNA degradation.

Noncoding Sequences

Long noncoding RNAs (lncRNAs), (more than 200 nucleotides) also regulate eukaryote gene expression.

Example: X chromosome inactivation. One of the two X chromosomes in female cells are silenced early in development by a lncRNA called Xist.

Figure 6.9 X chromosome inactivation

Noncoding Sequences

The ENCODE project identified more than 50,000 lncRNAs and 9000 small noncoding RNAs, the roles of most of which remain to be determined.

The number of noncoding RNAs that have been identified in human cells is larger than the number of protein-coding genes.

Noncoding Sequences

Repetitive sequences:• Complex eukaryotic genomes

contain highly repeated noncoding DNA sequences that can be present in hundreds of thousands of copies.

Table 6.3 Repetitive Sequences in the Human Genome

Noncoding Sequences

Simple-sequence repeats—tandem arrays of short sequences; 1 to 500 nucleotides.

They can be separated by equilibrium centrifugation in CsCl density gradients: AT-rich sequences are less dense than GC-rich sequences.

Noncoding Sequences

Such repeat-sequence DNAs band as “satellites,” separate from the main band of DNA, and thus are called satellite DNAs.

They are not transcribed but some play important roles in chromosome structure.

Figure 6.10 Satellite DNA

Noncoding Sequences

Other repetitive DNA sequences are scattered throughout the genome:

SINEs (short interspersed elements) 100 –300 base pairs; make up about

13% of human DNA.

LINEs (long interspersed elements) 4–6 kb; make up about 21% of human DNA.

Noncoding Sequences

SINEs and LINEs are transposable elements—capable of moving to different sites in genomic DNA.

Both are retrotransposons—their transposition is mediated by reverse transcription.

Figure 6.11 Movement of retrotransposons

Noncoding Sequences

Retrovirus-like elements also move within the genome by reverse transcription.

DNA transposons move through the genome by being copied and reinserted as DNA sequences.

Noncoding Sequences

Our understanding of retrotransposons derives from studies of retroviruses.

Some retrotransposons are structurally similar to retroviruses. They encode reverse transcriptase and integrase and can move to new chromosomal sites, but can’t move from cell to cell.

Noncoding Sequences

LINEs also encode reverse transcriptase, but SINEs do not.

Transposition of SINEs and LINEs induce mutations and are associated with several diseases, such as hemophilia, cystic fibrosis, muscular dystrophy, and cancers.

Noncoding Sequences

Some mutations resulting from transposable elements may be beneficial, contributing to evolution of the species.

Example: LINEs can integrate into active genes; the associated transposition of cellular DNA sequences can lead to new combinations of regulatory and/or coding sequences.

Noncoding Sequences

Retrotransposons have also shaped the genome through recombination between repetitive sequences.

Rearrangements of chromosomal DNA resulting from recombination between LINEs integrated at different sites in the genome can lead to formation of new genes.

Figure 6.12 Recombination between repetitive sequences

Noncoding Sequences

Transposable elements occur within the majority of lncRNAs.

SINE sequences in some lncRNAs have been found to directly contribute to their function in gene regulation.

Noncoding Sequences

Many eukaryote genes are present in multiple copies.

Multiple copies of some genes are needed to produce RNAs or proteins in large quantities, such as ribosomal RNAs or histones.

Noncoding Sequences

Gene family: Related genes may be transcribed in different tissues or at different stages of development.

α and β subunits of hemoglobin are encoded by gene families; different members of these families are expressed in embryonic, fetal, and adult tissues.

Figure 6.13 Globin gene families

Noncoding Sequences

Gene families may have arisen by duplication of an ancestral gene, followed by mutation and divergence.

This resulted in evolution of proteins optimized for different functions (e.g., fetal globins have a higher affinity for O2 than do adult globins).

Noncoding Sequences

Some mutations result in loss of function.

Pseudogenes are nonfunctional gene copies that increase genome size.

There are about 11,000 pseudogenes in the human genome.

Noncoding Sequences

Gene duplication can occur in two ways:• Duplication of a segment of DNA

results in transfer of a block of DNA to a new location in the genome.Duplication of the entire genome is common in plants. Arabidopsis has probably undergone two full duplications.

Noncoding Sequences

• Duplication by reverse transcription of an mRNA, followed by integration of the cDNA copy into a new site on a chromosome (retrotransposition).

Result: An inactive gene copy, or processed pseudogene, which lacks introns and the normal sequences that direct transcription.

Figure 6.14 Formation of a processed pseudogene

Noncoding Sequences

Some retrotransposed genes remain functional.

Example: Short legs characteristic of some dog breeds result from retro-transposition of a gene that inhibits bone growth.

The transposed retrogene is abnormally expressed, resulting in premature termination of bone growth.

Figure 6.15 Transposition of a retrogene determines short legs in dog breeds

Chromosomes and Chromatin

Prokaryotes have a single chromosome, usually a circular DNA molecule.

Eukaryotes have multiple chromosomes, each containing a linear DNA molecule.

The basic structure of chromosomes is the same in all eukaryotes, but number and size varies with species.

Table 6.4 Chromosome Numbers of Eukaryotic Cells

Chromosomes and Chromatin

Eukaryote DNA is tightly bound to small proteins (histones) that package the DNA in the nucleus.

The total extended length of human DNA is nearly 2 meters, but it must fit into a nucleus with a diameter of 5 to 10 μm.

Chromosomes and Chromatin

Chromatin is a complex of eukaryotic DNA and proteins.

The histones have a high proportion of basic amino acids (arginine and lysine) that facilitate binding to negatively charged DNA.

The major types of histones are very similar among in all eukaryotes.

Table 6.5 The Major Histone Proteins

Chromosomes and Chromatin

Histones are extremely abundant in eukaryotic cells.

Chromatin also contains many other proteins involved in activities such as replication and gene expression.

Chromosomes and Chromatin

The nucleosome is the basic structural unit of chromatin.

Kornberg proposed a nucleosome model in 1974.

Evidence from chromatin digestion and gel electrophoresis and electron microscopy showed that chromatin is composed of repeating 200-base-pair units.

Figure 6.16 The organization of chromatin in nucleosomes (Part 1)

Figure 6.16 The organization of chromatin in nucleosomes (Part 2)

Figure 6.16 The organization of chromatin in nucleosomes (Part 3)

Chromosomes and Chromatin

147 base pairs of DNA is wrapped around a histone core to form nucleosome core particles.

Histone H1 is bound to DNA where it enters the core particle.

This forms a chromatosome.

Figure 6.17 Structure of a chromatosome (Part 1)

Figure 6.17 Structure of a chromatosome (Part 2)

Chromosomes and Chromatin

Packaging of DNA with histones yields a chromatin fiber approximately 10 nm in diameter, which shortens its length about sixfold.

It is further condensed by coiling into 30-nm fibers, resulting in a total condensation of about fiftyfold.

Figure 6.18 Chromatin fibers (Part 1)

Figure 6.18 Chromatin fibers (Part 2)

Figure 6.18 Chromatin fibers (Part 3)

Chromosomes and Chromatin

The extent of chromatin condensation varies during the life cycle of the cell.

In interphase (nondividing) cells, most of the chromatin (called euchromatin) is relatively decondensed and distributed throughout the nucleus.

Genes that are actively transcribed are in a more decondensed state.

Figure 6.19 Interphase chromatin

Chromosomes and Chromatin

About 10% of interphase chromatin (heterochromatin) is highly condensed and resembles the chromatin of cells undergoing mitosis.

Heterochromatin is transcriptionally inactive and contains highly repeated DNA sequences.

Chromosomes and Chromatin

As cells enter mitosis, the chromosomes become highly condensed.

At metaphase, the DNA has been condensed nearly ten-thousandfold.

No transcription occurs during mitosis.

Figure 6.20 Chromatin condensation during mitosis

Chromosomes and Chromatin

Electron micrographs show that DNA in metaphase chromosomes is organized into large loops attached to a protein scaffold.

However the detailed structure and the mechanism of chromatin condensation is not currently understood.

Figure 6.21 Structure of metaphase chromosomes

Chromosomes and Chromatin

Metaphase chromosome morphology can be studied with a light microscope.

Staining yields characteristic patterns of light and dark bands, resulting from preferential binding of stains to AT-rich versus GC-rich DNA sequences.

Genes can be located on specific bands by in situ hybridization.

Figure 6.22 Human metaphase chromosomes

Chromosomes and Chromatin

Centromere: specialized region that ensures correct distribution of duplicated chromosomes to daughter cells during mitosis.

DNA is replicated during interphase, resulting in two identical sister chromatids.

During metaphase, chromatids are held together at the centromere.

Figure 6.23 Chromosomes during mitosis

Chromosomes and Chromatin

Microtubules of the mitotic spindle attach to the centromere, and the sister chromatids separate and move to opposite poles.

The nuclear membrane re-forms, and the chromosomes decondense.

Each daughter nuclei contains one copy of each parental chromosome.

Chromosomes and Chromatin

Centromeres are DNA sequences to which proteins bind, forming a kinetochore.

Spindle microtubules bind to the kinetochore.

Proteins associated with the kinetochore act as “molecular motors” to drive the movement of chromosomes along the spindle fibers.

Figure 6.24 The centromere of a metaphase chromosome

Chromosomes and Chromatin

Centromeric sequences were initially defined in yeasts, by studying plasmid segregation.

Plasmids with functional centromeres segregate like chromosomes and are equally distributed to daughter cells.

Plasmids without centromeres don’t segregate properly.

Figure 6.25 Assay of a centromere in yeast

Chromosomes and Chromatin

These assays have allowed determination of sequences required for centromere function.

Centromere sequences vary considerably in different organisms.

Figure 6.26 Centromeric DNA Sequences (Part 1)

Figure 6.26 Centromeric DNA Sequences (Part 2)

Figure 6.26 Centromeric DNA Sequences (Part 3)

Figure 6.26 Centromeric DNA Sequences (Part 4)

Chromosomes and Chromatin

Chromatin at centromeres has a unique structure.

Histone H3 is replaced by a variant called CENP-A. It is present at the centromeres of all organisms.

CENP-A-containing nucleosomes are required for assembly of the other kinetochore proteins.

Chromosomes and Chromatin

Unique chromatin structure allows centromeres to be stably maintained at cell division.

This is an example of epigenetic inheritance—the transfer of information that is not based on DNA sequences. The information is carried by histones.

Chromosomes and Chromatin

When chromosomal DNA replicates, the parental nucleosomes are distributed to the progeny strands.

These CENP-A-containing nucleosomes direct the assembly of new CENP-A-containing nucleosomes into chromatin.

Figure 6.27 Epigenetic inheritance of CENP-A

Chromosomes and Chromatin

Telomeres: sequences at the ends of chromosomes; required for replication of linear DNA.

Sequences are similar among eukaryotes, with repeats containing clusters of G residues on one strand.

They are repeated hundreds or thousands of times and end with a 3′ overhang of single-stranded DNA.

Table 6.6 Telomeric DNAs

Chromosomes and Chromatin

The telomere sequences of some organisms (including humans) form loops at the ends.

They bind a protein complex (shelterin) that protects the chromosome termini from degradation.

Figure 6.28 Structure of a telomere

Chromosomes and Chromatin

The ends of linear chromosomes cannot be replicated by DNA polymerase.

Instead, telomerase, which uses reverse transcriptase activity, replicates telomeric DNA sequences.

Chromosomes and Chromatin

Maintenance of telomeres appears to be important in determining the lifespan and reproductive capacity of cells.

Studies of telomeres and telomerase may provide new insights into aging and cancer.