BiologyOverview.ppt

Preview:

Citation preview

CS 6463: An overview of Molecular Biology 1

21st Century = Biotech Century

• Completion of human genome• High-throughput microarray and similar devices• Cloning• Genetic engineering• Computational power

Everyone is moving towards Biotech

CS 6463: An overview of Molecular Biology 2

Explosive growth of biological data

Biology is becoming more computational intensive. High throughput bioinformatics, Lots of data The Molecular Biology Database Collection: 2005 update

Small excerpt from the A's: AARSDB: Aminoacyl-tRNA synthetase sequences ABCdb: ABC transporters AceDB: C. elegans, S. pombe, and human sequences

and genomic information ACTIVITY: Functional DNA/RNA site activity ALFRED: Allele frequencies and DNA polymorphisms

CS 6463: An overview of Molecular Biology 3

Opportunities for CS

Possibilities for CS contributions Data integration problem Data extraction from literature (natural language

processing) Database issues (including automation) Visualization Mining large complex data sets

CS 6463: An overview of Molecular Biology 4

Objective Introduction to basic molecular biology to computer science

students by a computer scientist.

A survey of databases: NCBI, SwissProt, PDB, Transfac, … Introduction to computational techniques in analyzing

genomics (and proteomics) data

Basic

CS 6463: An overview of Molecular Biology 5

Communication is important

CS 6463: An overview of Molecular Biology 6

Textbooks and course website

Required textbooks: Molecular Biology of the Cell (Main text) Bioinformatics, Genomics and Proteomics (Lab) Other material

References: Human Molecular Genetics (2nd Edition available for free) Data Mining : Practical Machine Learning Tools and

Techniques with Java Implementations (The Morgan Kaufmann Series in Data Management Systems) by Ian H. Witten, Eibe Frank (Paperback)

Microarrays for an Integrative Genomics (Computational Molecular Biology) [Paperback] By: Isaac S. Kohane, et al

Molecular Biology Web Book Course website:

http://www.cs.utsa.edu/~kwek/cs6463f05.html

CS 6463: An overview of Molecular Biology 7

Intended Audience CS graduate students with an interest in

bioinformatics or want to explore bioinformatics. High School Biology.

Not for students who want to find a filler class in between classes.

Every Tuesday noon to 1pm, Human Genome (HuGe) lab meets to discuss current bioinformatics issues. All are welcome even if you are new to bioinformatics (but are taking this course).

CS 6463: An overview of Molecular Biology 8

Database Search

CS 6463: An overview of Molecular Biology 9

Course Organization Overview of Molecular Biology (and project discussion)

Databases Introduction to Cell:1. Cells and Genomes2. Cell Chemistry and Biosynthesis3. Proteins

Data preprocessingClassification problemClustering problemMicroarray analysis

Sequence alignmentHidden Markov Model

Basic Genetic Mechanisms4. DNA and Chromosomes6. From DNA to Protein7. Control of gene expression

Diseases:23. Cancer25. PathogensOthers: SNP, NRAi

Gene findingMotif finding

Bioinformatics/Computational Biology Molecular Biology

CS 6463: An overview of Molecular Biology 10

Project Grade distributions

1 Quiz – 10% 2 tests – 30% Homework and Lab – 10% Project – 50% (+ 10% bonus)

Project Serious in bioinformatics (all HuGe Lab members): Mini (NIH-)

proposal project. Besides preliminary results, a proposal for future work (i.e. independent studies, theses). Possible collaborations with UTHSCSA and others.

Specific Aim(s): What do you want to do? Why is it important? Background: What have been done previously? (What make you

approach interesting?) Where do you get your data? (Preliminary) Result: To elaborate later. Future Work: To elaborate later.

A project: Same as above except do not need to have future work. Office hours (for projects): By appointment (send me an email

24 hours before) Tu, Th 10-3, 5-7, 8:30-10. W 10:30-noon.

CS 6463: An overview of Molecular Biology 11

Some Important Dates September 13: Quiz 1 (there will be a second chance quiz) September 20: Specific aim of project due. [1 meeting to

discuss with me] October 27: Test 1 October 18: Background of project due. (you must already

started doing experiments) [2 meetings to discuss with me] November 24: Test 2 December 10: Final report of project. [2 meetings to discuss

with me]

IMPORTANT: if you do not meet me the require number of times, I am not accepting your report. Also, each meeting should be at least one week a part.

CS 6463: An overview of Molecular Biology 12

Your Responsibility

• Read the assigned reading once the material is covered in lecture. Lecture is to make your reading easier.

• Try printing out the slides to take notes.

• Project: Observe the deadline!!!! Come and talk to me.

CS 6463: An overview of Molecular Biology 13

A. An overview of molecular biology

Read Human Molecular Genetic Ch. 1A.1. BackgroundA.2. MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing

and protein structureA.7. Project ideas

CS 6463: An overview of Molecular Biology 14

Two types of cells:1. Prokaryotic (bacteria)2. Eukaryotic (multicellular organisms,Ameba, E. Coli)

A.1 Background: Procaryotic and Eukaryotic Cells

CS 6463: An overview of Molecular Biology 15

A.1 Background: Procaryotic and Eukaryotic Cells

http://www-class.unl.edu/bios201a/spring97/group6/

CS 6463: An overview of Molecular Biology 16

A.2. Building Blocks: Chemical Composition of Eukaryotic Cell

Water [E. Coli: 70%, Mammalian Cell: 70%] Macro-molecules:

DNA: Deoxyribonucleic Acid [E. Coli: 1%, Mammal: 0.25%] RNA: Ribonucleic Acid [E. Coli: 6%, Mammal: 1.1%] Proteins [E. Coli: 15%, Mammal: 18%]

Inorganic ions: Na+, K+, Mg+, Ca2+, Cl- [E. Coli: 1%, Mammal: 1%] Lipids:

Phospholipids [E. Coli: 2%, Mammal: 3%] Other lipids [E. Coli: -, Mammal: 0.2%]

Polysaccahrides [E. Coli: 1%, Mammal: 0.25%]

Volume: [E. Coli: 2 x 10-12cm, Mammal: 4 x 10-9cm] Relative Volume: [E. Coli: Mammal = 1: 2000]

CS 6463: An overview of Molecular Biology 17

A.2 Building Blocks: Structure of bases, nucleosides and nucleotides

DNA: ‘polymer of A, G, T, C’RNA: ‘polymer of A, G, U (replace T), C’

sugar

base

Purines:

Pyrimidines:

CS 6463: An overview of Molecular Biology 18

A.2. Building Blocks: Common bases found in nucleic acids

CS 6463: An overview of Molecular Biology 19

A.2 Building Blocks: 20 amino acids

Polypeptides: chains of amino acids

Amino groupCarboxyl group

CS 6463: An overview of Molecular Biology 20

A.2. Building Blocks: Abbreviation of Amino Acids

NameAbbreviation

Linear Structure

Alanine ala A CH3-CH(NH2)-COOH

Arginine arg R HN=C(NH2)-NH-(CH2)3-CH(NH2)-COOH

Asparagine asn N H2N-CO-CH2-CH(NH2)-COOH

Aspartic Acid asp D HOOC-CH2-CH(NH2)-COOH

Cysteine cys C HS-CH2-CH(NH2)-COOH

Glutamic Acid glu E HOOC-(CH2)2-CH(NH2)-COOH

Glutamine gln Q H2N-CO-(CH2)2-CH(NH2)-COOH

Glycine gly G NH2-CH2-COOH

Histidine his H NH-CH=N-CH=C-CH2-CH(NH2)-COOH

Isoleucine ile I CH3-CH2-CH(CH3)-CH(NH2)-COOH

Leucine leu L (CH3)2-CH-CH2-CH(NH2)-COOH

Lysine lys K H2N-(CH2)4-CH(NH2)-COOH

Methionine met M CH3-S-(CH2)2-CH(NH2)-COOH

Phenylalanine

phe F Ph-CH2-CH(NH2)-COOH

Proline pro P NH-(CH2)3-CH-COOH

Serine ser S HO-CH2-CH(NH2)-COOH

Threonine thr T CH3-CH(OH)-CH(NH2)-COOH

Tryptophan trp W Ph-NH-CH=C-CH2-CH(NH2)-COOH

Tyrosine tyr Y HO-Ph-CH2-CH(NH2)-COOH

Valine val V (CH3)2-CH-CH(NH2)-COOH

CS 6463: An overview of Molecular Biology 21

A.2. Building blocks: Properties of Amino Acids I

http://www.russell.embl-heidelberg.de/aas/aas.html

CS 6463: An overview of Molecular Biology 22

A.2. Building blocks: Some Terms for describing Properties of Amino Acids

Hydrophobic amino acids are those with side-chains that do not like to reside in an aqueous (i.e. water) environment.

Polar amino acids are those with side-chains that prefer to reside in an aqueous (i.e. water) environment.

Strictly speaking, aliphatic implies that the protein side chain contains only carbon or hydrogen atoms.

A side chain is aromatic when it contains an aromatic ring system.

CS 6463: An overview of Molecular Biology 23

A.2 Building Blocks: Covalent and Non-covalent Bonds

Covalent bonds: stronger. Nucleic acid and protein polymers are from by covalent binds connecting nucleotides and amino acids (respectively) to form a linear backbone

Non-covalent bonds: weaker and revisible. 4 types:

1. Hydrogen bonds: N – H –O [double-stranded DNA, protein folding, …etc

2. Ionic bonds: Ionic interaction between charged group, sat Na+ and Cl-

3. Van der Waals: Optimum attraction between two atoms.

4. Hydrophobic forces: Water is polar molecules,

CS 6463: An overview of Molecular Biology 24

A. An overview of molecular biology

A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing

and protein structureA.7. Project ideas

CS 6463: An overview of Molecular Biology 25

A.3 DNA Structure: The Phosphodiester Bond

CS 6463: An overview of Molecular Biology 26

A.3 DNA Structure: base pairing (Watson-Crick Rule).

CS 6463: An overview of Molecular Biology 27

A.3 DNA Structure: DNA is a double-stranded anti-parallel helix

http://www.sumanasinc.com/webcontent/anisamples/molecularbiology/DNA_structure.html

upst

ream

dow

nstr

eam

ComplementaryDNA(cDNA)

%GC = 40%? How many % is G? C? A? T?

CS 6463: An overview of Molecular Biology 28

A.3 DNA Structure: DNA is a double-stranded anti-parallel helix

CS 6463: An overview of Molecular Biology 29

A.3 DNA Structure: RNA structure

palindrome

CS 6463: An overview of Molecular Biology 30

A.3 DNA Structure: Viral Genomes

Highly Variable: DNA or RNA Single stranded or double stranded Linear or Circular Segmented and Multipartite

Virus normally replicate in the cytosol. Unusal Retrovirus duplicate itself in the nucleus (using reverse transcriptase)

CS 6463: An overview of Molecular Biology 31

A.4 DNA Structure: The Central Dogma

Old 1-directional model

CS 6463: An overview of Molecular Biology 32

A. An overview of molecular biology

A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing

and protein structureA.7. Project ideas

CS 6463: An overview of Molecular Biology 33

A.4 Transcription and Gene Expression:Transcription

exon exon exonintronintronstart stop5’ UTR 3’ UTRpromoterTFBS

5’ 3’

(1st key)

Nuclear membrane

(2nd key, May not be there)

exon exon exonintronintronstart stop5’ UTR 3’ UTR

(complementary nucleotides)

Pre-mRNA poly A

cap

pore

TFBS(almost always there)

(mostly for non-housing gene)

TFBS – Transcription factor binding site

CS 6463: An overview of Molecular Biology 34

A.4 Transcription and Gene Expression:Gene Regulation

A G T C

U C A G

http://henge.bio.miami.edu/mallery/movies/transcription.mov

G

C

G

http://www-class.unl.edu/biochem/gp2/m_biology/animation/gene/gene_a2.html

CS 6463: An overview of Molecular Biology 35

A.4 Transcription and Gene Expression:RNA Polymerase

There are three classes of RNA Polymerases: Polymerase I: Localized in the nucleolus. Transcribe

rRNA (ribosome RNA) 28S, 18S 5.8S rRNA. Polymerase II: All protein-coding genes most

smRNAs. Unique in capping and polyadenylation. Polymerase III: tRNA, other rRNAs, snRNAs. [The

promoter can be downstream]

Pusedo-genes (gene fragments): Previously were genes

Only 2% of the human genome encode proteins.

CS 6463: An overview of Molecular Biology 36

A.4 Transcription and Gene Expression: Trans- and cis-elements

Cis- element DNA sequence Trans-acting Factor

GC Box GGGCGG Spl

TATA Box TATAA TFIID (TFIIA – stabilize it)

CAAT Box CCAAT Many

TRE GTGAGT(A/C)A AP-1 family (many)

CRE (cAMP response element)

GTGACGT(A/C)A(A/G)

CREB/ATF family

Important: If pattern is there, does not necessary mean it is a cis-element.

CS 6463: An overview of Molecular Biology 37

A.4 Transcription and Gene Expression: Promoters

Start from 1 not 0

CS 6463: An overview of Molecular Biology 38

A.4 Transcription and Gene Expression: Enhancers and Silencers (Transcription Factors)

Many basepairsaway

CS 6463: An overview of Molecular Biology 39

A.4 Transcription and Gene Expression: Tissue Specific Genes

House keeping genes: Genes encoding histone protein, ribosome protein. Always on.

Tissue or development-specific (non-housekeeping) genes: Transcriptional inactive chromatin Methylation of Cytosine, replacing a hydrogen (H) with

methyl (CH3) Transcription factors’ expression levels are low.

Microarrays measure the expression levels of genes

CS 6463: An overview of Molecular Biology 40

A. An overview of molecular biology

A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing

and protein structureA.7. Project ideas

CS 6463: An overview of Molecular Biology 41

A.4 Transcription and Gene Expression:Transcription

exon exon exonintronintronstart stop5’ UTR 3’ UTRpromoterTFBS

5’ 3’

(1st key)

Nuclear membrane

(2nd key, May not be there)

Splicing the introns: http://www.sumanasinc.com/webcontent/anisamples/molecularbiology/mRNAsplicing.html

exon exon exonintronintronstart stop5’ UTR 3’ UTR

(complementary nucleotides)

Pre-mRNA poly A

exon exon exonstart stop5’ UTR 3’ UTRMassager RNA (mRNA) poly A

cap

pore

TFBS(almost always there)

(mostly for non-housing gene)

CS 6463: An overview of Molecular Biology 42

A.5 RNA Processing: RNA Splicing

donoracceptor

GT-AG spliceosomeAT-AC spliceosome (rare)

CS 6463: An overview of Molecular Biology 43

A.5 RNA Processing: Consensus Sequences at splice donor, acceptor and branch sites

CS 6463: An overview of Molecular Biology 44

A.5 RNA Processing: Mechanism of RNA Splicing (GU-AG introns)

Splicesome(5 snRNA)

http://www.nature.com/nrn/journal/v2/n1/animation/nrn0101_043a_swf_MEDIA1.html

CS 6463: An overview of Molecular Biology 45

A.5 RNA Processing: 5’ End Capping

CS 6463: An overview of Molecular Biology 46

A.5 RNA Processing: 3’ end polyadenylated.

CS 6463: An overview of Molecular Biology 47

A.5 RNA Processing: Functions of 5’ End Cap and Poly A tail

Functions of 5’ end cap

1. Prevent mRNA molecules degradation.

2. Facilitate transport to cytoplasm

3. RNA splicing

4. Facilitate translation

Function of 3’ end poly(A) tail

1. Facilitate transport to cytoplasm

2. Stabilize the mRNA in the cytoplasm

3. Facilitate translation

CS 6463: An overview of Molecular Biology 48

A.5 RNA Processing: Example of the human -globin gene

CS 6463: An overview of Molecular Biology 49

A.4 RNA Processing: Export out of the nuclear

CS 6463: An overview of Molecular Biology 50

A. An overview of molecular biology

A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing

and protein structureA.7. Project ideas

CS 6463: An overview of Molecular Biology 51

A.5 RNA Processing: The Codon-anticodon Recognition

http://henge.bio.miami.edu/mallery/movies/translation.mov

(almost always) tRNA

CS 6463: An overview of Molecular Biology 52

A.6 Translation and Post-Translational Processing : Peptide Bond Formation

CS 6463: An overview of Molecular Biology 53

A.6 Translation and Post-Translational Processing: The Genetic Codes

N-terminalC-terminal

CS 6463: An overview of Molecular Biology 54

A.6 Translation and Post-Translational Processing: The Genetic Codes

wobble- mitochondrial

64 possible codons: 1 Start codon AUG. 3 stop codons, 20 amino acids

Signal in mRNAs can lead to alternative interpretation of stop codons:UGA 21st AA selencocysteine, UAG 22nd AA pyrrolysine.

CS 6463: An overview of Molecular Biology 55

A.6 Translation and Post-Translational Processing: Multiple Post-Translational Cleavages of Polypeptide Precursors

CS 6463: An overview of Molecular Biology 56

A.6 Translation and Post-Translational Processing: Protein Secondary Structure

CS 6463: An overview of Molecular Biology 57

A.6 Translation and Post-Translational Processing: Quaternary

Amino acid sequence secondary structure tertiary structure

Amino acid sequence

CS 6463: An overview of Molecular Biology 58

A.6 Translation and Post-Translational Processing: Quaternary Structure

CS 6463: An overview of Molecular Biology 59

A.6 Translation and Post-Translational Processing: Disulfide Bridges

CS 6463: An overview of Molecular Biology 60

A.6 Translation and Post-Translational Processing: Post-translational Modification

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hmg.table.103

CS 6463: An overview of Molecular Biology 61

A.6 Translation and Post-Translational Processing: Protein Sorting (Localization)

Protein Destination (Typical) Location and form of signal

Endoplasmic reticulum and secretion from cell

N-terminal peptide of 20 or so very hydrophobic AAs.

Mitochondria N-terminal peptide, a-helix. One side hydrophilic and one side hydrophobic

Nucleus Internal sequence of amino acids. Often a string of basic amino acids plus prolines; maybe bipartite.

Lysosome Addition of mannose 6-phosphate residues

1. Signal Peptide

2. Post-translational modification

CS 6463: An overview of Molecular Biology 62

A.6 Translation and Post-Translational Processing: Cellular Function of Proteins

Diverse cellular functions: Enzymes – ‘cut things into pieces’ Receptors Transport Transcription factor Signaling Hormones Strutural .. etc

CS 6463: An overview of Molecular Biology 63

A. An overview of molecular biology

A.1. Background A.2. Building Blocks of MacromoleculesA.3. DNA structureA.4. RNA transcription and Gene ExpressionA.5. RNA processingA.6. Translation, post-translation processing

and protein structureA.7. Project ideas

CS 6463: An overview of Molecular Biology 64

A.7 Summary: Central Dogma Simplify

Enzymes, Receptors,... etc

CS 6463: An overview of Molecular Biology 65

A.7 Summary: Don’t forget about mitochondria!

CS 6463: An overview of Molecular Biology 66

A.7 Summary: Life is more complex