Bioinformatics Presented by Frank H. Osborne, Ph. D. © 2005 Bio 2900 Computer Applications in...

Preview:

Citation preview

Bioinformatics

Presented byFrank H. Osborne, Ph. D.

© 2005

Bio 2900Computer Applications in Biology

Bioinformatics

• Bioinformatics is the computational branch of molecular biology.

• It involves using computers in the analysis of DNA, RNA and protein sequences.

• It is part of a larger field of biology called Computational Biology.

Protein Synthesis

• Generally, we begin with DNA.

• DNA is transcribed to produce RNA.

• RNA is then translated to produce protein.

• The protein is the result of the expression of a gene.

Amino Acids• Proteins are made of amino acids. There

are about 20 that are generally used in protein molecules.

• A set of three-letter abbreviations is used for the amino acids in biochemistry.

• The International Union of Pure and Applied Chemistry (IUPAC) has created one-letter abbreviations to ease work in bioinformatics.

Amino Acid Table

Additional Amino Acid Codes

• Additionally, IUPAC recognizes other code letters for special situations.

• There are an additional four codes that may be used.

Additional Amino Acid Code Table

DNA• Deoxyribonucleic acid (DNA) is made up of

purine bases (adenine and guanine) and pyrimidine bases (cytosine and thymine). Bases are part of nucleotides which are formed using the sugar deoxyribose. Nucleotides are connected by condensation reaction from the 5’OH to the 3’OH.

DNA• For DNA sequences, the IUPAC has

established the one-letter codes shown below.

RNA• The IUPAC one-letter codes for RNA are

shown below.

Gene structure

• A gene is a sequence of bases of DNA. It begins at a location known as a promoter and ends at another location called the terminator.

Gene expression

• Genes are expressed by transcription and translation of DNA. DNA is first transcribed to make messenger RNA. The genetic code of the messenger RNA is translated into protein.

RNA polymerase

• Transcription uses DNA-dependent RNA-polymerase. RNA polymerase holoenzyme consists of a core enzyme of four polypeptides and another factor called factor.

• Core enzyme = – 2 identical subunits– , ’ similar but different proteins

• Holoenzyme = core enzyme + factor

• There are different types of promoters that are recognized by different factors.

Transcription

• Transcription consists of three stages called initiation, elongation and termination. Note that these are not the same as initiation, elongation and termination of protein synthesis, which make up the process of translation.

Stages of transcription

• Initiation– RNA polymerase attaches to the promoter. An

open complex forms.

• Elongation– RNA polymerase moves along the DNA molecule

making a molecule of RNA as it travels.

• Termination– RNA polymerase reaches the terminator. The

RNA is released.

Translation

• The mRNA molecule is translated into protein using the standard genetic code. There are some exceptions, especially during protein synthesis in mitochondria.

Stages of translation

• Initiation– Ribosomes bind to the ribosome-binding site on

the mRNA molecule known as the Shine-Dalgarno sequence adjacent to AUG.

• Elongation– Transfer RNA brings each amino acid to the

amino-acyl site according to the specified codons.

• Termination– The completed protein is released from the

peptidyl site.

Gene organization in Bacteria

• A cistron is a distinct region of DNA that codes for a particular polypeptide. The term is used in the context of a protein which is made up of several subunits, each of which is coded by a different gene.

• An operon is a common form of gene organization in bacteria.

Genotypes and phenotypes

• The genotype is an actual gene in the chromosome. The phenotype is the observed effect of that gene.

• Genotypes are given using italic letters. Phenotypes are written in ordinary, regular letters. Thus, two of the tryptophan genes in E. coli would be trpA and trpB. When expressed, they produce polypeptides. The trpA gene produces trpA (TrpA) polypeptide and the trpB gene produces trpB (TrpB) polypeptide.

Regulation of gene expression

The lac operon

• The lac operon contains the genes necesary to utilize lactose. Lactose is a -galactoside sugar containing galactose (1,4) as shown below.

Regulation of gene expression

Products of the lac operon

• The lac operon codes for three proteins; LacZ, LacY, LacA; which are directly involved in galactoside (lactose) utilization.– LacZ - b-D-galactosidase (EC 3.2.1.23)– LacY - galactoside permease (M protein)– LacA - galactoside acetyltransferase (EC 2.3.1.18)

• These enzymes appear adjacent to each other on the E. coli chromosome. They are preceded by a region of the chromosome responsible for the regulation of these genes.

Regulation of gene expression

Function of the lac operon

• lacI - gene for the lac repressor protein

• lacPi - promoter for lacI

• lacP - promoter for lac operon

• lacO - operator: binding site for the repressor

LacI is a repressor that binds to the promoter (lacP) and prevents the gene from being transcribed. This type of control is known as transcriptional regulation.

Induction and repression

• When lactose is present it induces the operon by binding to the repressor and changing its shape, causing it to fall off the operator.

• When lactose is removed, the repressor goes back to its original shape and can bind to the operator again.

• Because the repressor binds to the operator, the RNA polymerase is said to be primed, meaning that it is ready to use as soon as the block comes off the operator.

Structure of the lac operon

Gene Expression in Eukaryotes

• DNA in eukaryotic organisms is organized into chromosomes. The eukaryotic chromosome consists of DNA interwound with proteins known as histones.

• Much eukaryotic DNA has either no function or unknown function. Unlike bacteria, only about 10% of eukaryotic DNA codes for proteins.

Gene Expression in Eukaryotes

• Eukaryotic DNA has numerous repeated nucleotide sequences. The protein-coding regions are separated by non-coding regions.

• The non-coding regions are called introns.

• The coding sequences that are expressed as protein are called exons.

Transcription in Eukaryotic Cells

The End