62
Ulf Schmitz, Introduction to genomics and proteomics I 1 www. .uni-rostock.de Bioinformatics Bioinformatics Introduction to genomics and proteomics I Introduction to genomics and proteomics I Ulf Schmitz [email protected] Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de

proteomics and genomics-1

Embed Size (px)

Citation preview

Page 1: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 1

www. .uni-rostock.de

BioinformaticsBioinformaticsIntroduction to genomics and proteomics IIntroduction to genomics and proteomics I

Ulf [email protected]

Bioinformatics and Systems Biology Groupwww.sbi.informatik.uni-rostock.de

Page 2: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 2

www. .uni-rostock.deOutline

Genomics/Genetics1. The tree of life

• Prokaryotic Genomes– Bacteria– Archaea

• Eukaryotic Genomes– Homo sapiens

2. Genes• Expression Data

Page 3: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 3

www. .uni-rostock.deGenomics - Definitions

Genetics: is the science of genes, heredity, and the variation of organisms. Humans began applying knowledge of genetics in prehistory with the domestication and breeding of plants and animals. In modern research, genetics provides tools in the investigation of the function of a particular gene, e.g. analysis of genetic interactions.

Genomics: attempts the study of large-scale genetic patterns across the genome for a given species. It deals with the systematic use of genome information to provide answers in biology, medicine, and industry.

Genomics has the potential of offering new therapeutic methods for the treatment of some diseases, as well as new diagnosticmethods.Major tools and methods related to genomics are bioinformatics, genetic analysis, measurement of gene expression, and determination of gene function.

Page 4: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 4

www. .uni-rostock.deGenes

• a gene coding for a protein corresponds to a sequence of nucleotides along one or more regions of a molecule of DNA

• in species with double stranded DNA (dsDNA), genes may appear on either strand

• bacterial genes are continuous regions of DNA

bacterium:• a string of 3N nucleotides encodes a string of N amino acids• or a string of N nucleotides encodes a structural RNA molecule of Nresidues

eukaryote:• a gene may appear split into separated segments in the DNA• an exon is a stretch of DNA retained in mRNA that the ribosomes translateinto protein

Page 5: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 5

www. .uni-rostock.deGenomics

Genome size comparison

4.1 million5,0001Bacterium(E. coli)

19,000

14,000

14,000

31,000

22.5-30,000

28-35,000

Genes

97 million12Roundworm(C. elegans)

137 million8Fruit Fly(Drosophila melanogaster)

289 million6Malaria mosquito(Anopheles gambiae)

365 million44Puffer fish(Fugu rubripes)

2.7 billion40Mouse(Mus musculus)

3.1 billion46(23 pairs)

Human(Homo sapiens)

Base pairsChrom.Species

Page 6: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 6

www. .uni-rostock.deGenes

exon:A section of DNA which carries the coding sequence for a protein or part of it. Exons are separated by intervening, non-coding sequences (called introns). In eukaryotes most genes consist of a number of exons.

exon:A section of DNA which carries the coding sequence for a protein or part of it. Exons are separated by intervening, non-coding sequences (called introns). In eukaryotes most genes consist of a number of exons.

intron:An intervening section of DNA which occurs almost exclusively within a eukaryotic gene, but which is not translated to amino-acid sequences in the gene product. The introns are removed from the pre-mature mRNA through a process called splicing, which leaves the exons untouched, to form an active mRNA.

intron:An intervening section of DNA which occurs almost exclusively within a eukaryotic gene, but which is not translated to amino-acid sequences in the gene product. The introns are removed from the pre-mature mRNA through a process called splicing, which leaves the exons untouched, to form an active mRNA.

Page 7: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 7

www. .uni-rostock.deGenes

exon intron

Globin gene – 1525 bp: 622 in exons, 893 in introns

Ovalbumin gene - ~ 7500 bp: 8 short exons comprising 1859 bp

Conalbumin gene - ~ 10,000 bp: 17 short exons comprising ~ 2,200 bp

Examples of the exon:intron mosaic of genes

Page 8: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 8

www. .uni-rostock.dePicking out genes in genomes

• Computer programs for genome analysis identify ORFs (open reading frames)

• An ORF begins with an initiation codon ATG (AUG)

• An ORF is a potential protein-coding region

• There are two approaches to identify protein coding regions…

Page 9: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 9

www. .uni-rostock.dePicking out genes in genomes

• Regions may encode amino acid sequences similar to known proteins• Or may be similar to ESTs (correspond to genes known to be

expressed)• Few hundred initial bases of cDNA are sequenced to identify a gene

1. Detection of regions similar to known coding regions from other organisms

2. Ab initio methods, seek to identify genes from the properties of the DNA sequence itself

• Bacterial genes are easy to identify, because they are contiguous• They have no introns and the space between genes is small• Identification of exons in higher organisms is a problem, assembling

them another…

Page 10: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 10

www. .uni-rostock.dePicking out genes in genomes

• The initial (5´) exon starts with a transcription start point, preceded by a core promoter site such as the TATA box (~30bp upstream)– Free of stop codons – End immediately before a GT splice-signal

Ab initio gene identification in eukaryotic genomes

binds and directs RNA polymerase to the correct transcriptional start site

Page 11: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 11

www. .uni-rostock.dePicking out genes in genomes

5' splice signal

3' splice signal

Page 12: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 12

www. .uni-rostock.dePicking out genes in genomes

• Internal exons are free of stop codons too– Begin after an AG splice signal– End before a GT splice signal

Ab initio gene identification in eukaryotic genomes

Page 13: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 13

www. .uni-rostock.dePicking out genes in genomes

• The final (3´) exon starts after a an AG splice signal– Ends with a stop codon (TAA,TAG,TGA)– Followed by a polyadenylation signal sequence

Ab initio gene identification in eukaryotic genomes

Page 14: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 14

www. .uni-rostock.deHumans havespliced genes…

Page 15: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 15

www. .uni-rostock.deDNA makes RNA makes Protein

Page 16: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 16

www. .uni-rostock.deTree of life

Pro

kary

otes

Page 17: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 17

www. .uni-rostock.deGenomics – Prokaryotes

• the genome of a prokaryote comes as a single double-stranded DNA molecule in ring-form

– in average 2mm long– whereas the cells diameter is only

0.001mm– < 5 Mb

• prokaryotic cells can have plasmids as well (see next slide)

• protein coding regions have no introns

• little non-coding DNA compared to eukaryotes

– in E.coli only 11%

Page 18: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 18

www. .uni-rostock.deGenomics - Plasmids• Plasmids are circular double stranded DNA molecules that are separate

from the chromosomal DNA.

• They usually occur in bacteria, sometimes in eukaryotic organisms

• Their size varies from 1 to 250 kilo base pairs (kbp). There are from one copy, for large plasmids, to hundreds of copies of the same plasmid present in a single cell.

Page 19: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 19

www. .uni-rostock.deProkaryotic model organisms

E.coli (Escherichia coli)

Methanococcus jannaschii (archaeon)

Mycoplasma genitalium(simplest organism known)

Page 20: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 20

www. .uni-rostock.deGenomics

• DNA of higher organisms is organized into chromosomes(human – 23 chromosome pairs)

• not all DNA codes for proteins

• on the other hand some genes exist in multiple copies

• that’s why from the genome size you can’t easily estimate the amount of protein sequence information

Page 21: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 21

www. .uni-rostock.deGenomes of eukaryotes

• majority of the DNA is in the nucleus, separated into bundles (chromosomes)– small amounts of DNA appear in organelles (mitochondria and

chloroplasts)• within single chromosomes gene families are common

– some family members are paralogues (related)• they have duplicated within the same genome• often diverged to provide separate functions in descendants

(Nachkommen)• e.g. human α and β globin

– orthologues genes• are homologues in different species• often perform the same function• e.g. human and horse myoglobin

– pseudogenes• lost their function• e.g. human globin gene cluster

pseudogene

Page 22: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 22

www. .uni-rostock.deEukaryotic model organisms

• Saccharomyces cerevisiae (baker’s yeast)

• Caenorhabditis elegans (C.elegans)

• Drosophila melanogaster (fruit fly)

• Arabidopsis thaliana (flower)

• Homo sapiens (human)

Page 23: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 23

www. .uni-rostock.deThe human genome

• ~3.2 x 109 bp (thirty time larger than C.elegans or D.melongaster)• coding sequences form only 5% of the human genome• Repeat sequences over 50%• Only ~32.000 genes• Human genome is distributed over 22 chromosome pairs plus X and

Y chromosomes• Exons of protein-coding genes are relatively small compared to

other known eukaryotic genomes• Introns are relatively long• Protein-coding genes span long stretches of DNA (dystrophin,

coding a 3.685 amino acid protein, is >2.4Mbp long)

• Average gene length: ~ 8,000 bp• Average of 5-6 exons/gene• Average exon length: ~200 bp• Average intron length: ~2,000 bp• ~8% genes have a single exon• Some exons can be as small as 1 or 3 bp.

Page 24: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 24

www. .uni-rostock.de

0.03Enzyme activator

20.62.92.55.31.8

3242457403839295

EnzymePeptidase

EndopeptidaseProtein kinaseProtein phosphatase

3.8603Defense/immunity protein

0.8129Actin binding

0.585Motor

0.9154Chaperone

0.475Cell Cycle regulator

0.06Transcription factor binding

14.010.50.20.06.22.40.80.2

22071656

457

98638013744

Nucleic acid bindingDNA binding

DNA repair proteinDNA replication factorTranscription factor

RNA bindingStructural protein of ribosomeTranslation factor

%NumberFunction

100.015683Total

30.64813Unclassified

0.05Tumor suppressor

9.70.20.3

15363350

Ligand binding or carrierElectron transfer

Cytochrome P450

4.31.70.1

68226919

TransporterIon channelNeurotransmitter transporter

4.50.9

714145

Structural proteinCytoskeletal structural protein

1.2189Cell adhesion

0.07Storage protein

11.48.47.63.10.0

17901318120248971

Signal transductionReceptorTransmembrane receptorG-protein link receptorOlfactory receptor

0.8132Apoptosis inhibitor

%NumberFunction

The human genomeThe human genomeTop categories in a function classification:

Page 25: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 25

www. .uni-rostock.deThe human genome

• Repeated sequences comprise over 50% of the genome:– Transposable elements, or interspersed repeats include LINEs and

SINEs (almost 50%)– Retroposed pseudogenes– Simple ‘stutters’ - repeats of short oligomers (minisatellites and

microsatellites)– Segment duplication, of blocks of ~10 - 300kb– Blocks of tandem repeats, including gene families

3300.00080-3000DNA Transposon fossils

8450.00015.000 -110.000Long Terminal Repeats

21850.0006000-8000Long Interspersed Nuclear Elements (LINEs)

131.500.000100-300Short Interspersed Nuclear Elements (SINEs)

Fraction of genome %

Copy numberSize (bp)Element

Page 26: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 26

www. .uni-rostock.deThe human genome

• All people are different, but the DNA of different people only varies for 0.2% or less.

• So, only up to 2 letters in 1000 are expected to be different.

• Evidence in current genomics studies (Single Nucleotide Polymorphisms or SNPs) imply that on average only 1 letter out of 1400 is different between individuals.

• means that 2 to 3 million letters would differ between individuals.

Page 27: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 27

www. .uni-rostock.de

TERTIARY STRUCTURE (fold)TERTIARY STRUCTURE (fold)

Genome

Expressome

Proteome

Metabolome

Functional GenomicsFrom gene to functionFrom gene to function

Page 28: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 28

www. .uni-rostock.deDNA makes RNA makes Protein:Expression data

• More copies of mRNA for a gene leads to more protein

• mRNA can now be measured for all the genes in a cell at ones through microarray technology

• Can have 60,000 spots (genes) on a single gene chip

• Color change gives intensity of gene expression (over- or under-expression)

Page 29: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 29

www. .uni-rostock.de

Page 30: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 30

www. .uni-rostock.deGenes and regulatory regions

regulatory mechanisms organize theexpression of genes

– genes may be turned on or off in response to concentrations of nutrients or to stress

– control regions often lie near the segments coding for proteins

– they can serve as binding sites for molecules that transcribe the DNA

– or they bind regulatory molecules that can block transcription

Page 31: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 31

www. .uni-rostock.deExpression data

Page 32: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 32

www. .uni-rostock.deOutlook – coming lecture

Proteomics– Proteins– post-translational modification– Key technologies

• Maps of hereditary information• SNPs (Single nucleotide polymorphisms)• Genetic diseases

Page 33: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics I 33

www. .uni-rostock.de

Thanks for your attention!

Page 34: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 1

www. .uni-rostock.de

BioinformaticsBioinformaticsIntroduction to genomics and proteomics IIIntroduction to genomics and proteomics II

[email protected]

Bioinformatics and Systems Biology Groupwww.sbi.informatik.uni-rostock.de

Page 35: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 2

www. .uni-rostock.deOutline

1. Proteomics• Motivation• Post -Translational Modifications• Key technologies• Data explosion

2. Maps of hereditary information3. Single nucleotide polymorphisms

Page 36: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 3

www. .uni-rostock.deProtomics

Proteomics:• is the large-scale study of proteins, particularly their structures and functions

• This term was coined to make an analogy with genomics, and is often viewed as the "next step",

• but proteomics is much more complicated than genomics. • Most importantly, while the genome is a rather constant entity,the proteome is constantly changing through its biochemical interactions with the genome.

• One organism will have radically different protein expression indifferent parts of its body and in different stages of its life cycle.

Proteome:The entirety of proteins in existence in an organism are referred to as the proteome.

Page 37: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 4

www. .uni-rostock.deProteomics

If the genome is a list of the instruments in an orchestra, the proteome is the orchestra playing a symphony.R.Simpson

Page 38: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 5

www. .uni-rostock.deProteomics• Describing all 3D structures of proteins in the cell is called Structural

Genomics• Finding out what these proteins do is called Functional Genomics

GENOME

PROTEOME

DNA Microarray Genetic Screens

Protein – LigandInteractions

Protein – ProteinInteractions

Structure

Page 39: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 6

www. .uni-rostock.deProteomics

• What kind of data would we like to measure?

• What mature experimental techniques exist to determine them?

• The basic goal is a spatio-temporal description of the deployment of proteins in the organism.

Motivation:

Page 40: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 7

www. .uni-rostock.deProteomics

• the rates of synthesis of different proteins vary among different tissues and different cell types and states of activity

• methods are available for efficient analysis of transcription patterns of multiple genes

• because proteins ‘turn over’ at different rates, it is also necessary to measure proteins directly

• the distribution of expressed protein levels is a kinetic balance between rates of protein synthesis and degradation

Things to consider:

Page 41: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 8

www. .uni-rostock.de

Page 42: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 9

www. .uni-rostock.deWhy do Proteomics?

• are there differences between amino acid sequences determined directly from proteins and those determined by translation from DNA?– pattern recognition programs addressing this questions have following

errors:• a genuine protein sequence may be missed entirely• an incomplete protein may be reported• a gene may be incorrectly spliced• genes for different proteins may overlap• genes may be assembled from exons in different ways in different tissues

– often, molecules must be modified to make a mature protein that differs significantly from the one suggested by translation

• in many cases the missing post-translational- modifications are quite important and have functional significance

• post-transitional modifications include addition of ligands, glycosylation, methylation, excision of peptides, etc.

– in some cases mRNA is edited before translation, creating changes in the amino acid sequence that are not inferrable from the genes

• a protein inferred from a genome sequence is a hypothetical object until an experiment verifies its existence

Page 43: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 10

www. .uni-rostock.dePost-translational modification• a protein is a polypeptide chain composed of 20 possible amino acids

• there are far fewer genes that code for proteins in the human genome than there are proteins in the human proteome (~33,000 genes vs ~200,000 proteins).

• each gene encodes as many as six to eight different proteins– due to post-translational modifications such as phosphorylation, glycosylation or cleavage

(Spaltung)

• posttranslational modification extends the range of possible functions a protein can have

– changes may alter the hydrophobicity of a protein and thus determine if the modified protein is cytosolic or membrane-bound

– modifications like phosphorylation are part of common mechanisms for controlling the behavior of a protein, for instance, activating or inactivating an enzyme.

Page 44: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 11

www. .uni-rostock.dePost-translational modification

• phosphorylation is the addition of a phosphate (PO4) group to a protein or a small molecule (usual to serine, tyrosine, threonine or histidine)

• In eukaryotes, protein phosphorylation is probably the most important regulatory event

• Many enzymes and receptors are switched "on" or "off" by phosphorylation and dephosphorylation

• Phosphorylation is catalyzed by various specific protein kinases, whereas phosphatases dephosphorylate.

Phosphorylation

Acetylation• Is the addition of an acetyl group, usually at the N-terminus of the protein

Farnesylation• farnesylation, the addition of a farnesyl groupGlycosylation• the addition of a glycosyl group to either asparagine, hydroxylysine,

serine, or threonine, resulting in a glycoprotein

Page 45: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 12

www. .uni-rostock.deProteomics

Page 46: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 13

www. .uni-rostock.deKey technologies for proteomics

1. 1-D electrophoresis and 2-D electrophoresis• are for the separation and visualization of proteins.

2. mass spectrometry, x-ray crystallography, and NMR(Nuclear magnetic resonance ) • are used to identify and characterize proteins

3. chromatography techniques especially affinity chromatography• are used to characterize protein-protein interactions.

4. Protein expression systems like the yeast two-hybrid and FRET (fluorescence resonance energy transfer) • can also be used to characterize protein-protein interactions.

Page 47: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 14

www. .uni-rostock.deKey technologies for proteomics

Reference map of lympphoblastoid cell linePRI, soluble proteins.• 110 µg of proteins loaded• Strip 17cm pH gradient 4-7, SDS

PAGE gels 20 x 25 cm, 8-18.5% T. • Staining by silver nitrate method(Rabilloud et al.,)

• Identification by mass spectrometry.The pinks labels on the spots indicatethe ID in Swiss-prot database

browse the SWISS-2DPAGE database for more 2d PAGE images

High-resolution two-dimensional polyacrylamide gel electrophoresis (2D PAGE) shows the pattern of protein content in a sample.

Page 48: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 15

www. .uni-rostock.deProteomics

Typically, a sample is purified to homogeneity, crystallized, subjected to an X-ray beam and diffraction data are collected.

X-ray crystallography is a means to determine the detailed molecular structure of a protein, nucleic acid or small molecule.

With a crystal structure we can explain the mechanism of an enzyme, the binding of an inhibitor, the packing of protein domains, the tertiary structure of a nucleic acid molecule etc..

Page 49: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 16

www. .uni-rostock.deHigh-throughput Biological Data

• Enormous amounts of biological data are being generated by high-throughput capabilities; even more are coming– genomic sequences– gene expression data (microarrays)– mass spec. data– protein-protein interaction (chromatography)– protein structures (x-ray christallography)– ......

Page 50: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 17

www. .uni-rostock.deProtein structural data explosionProtein Data Bank (PDB): 33.367 Structures (1 November 2005)28.522 x-ray crystallography, 4.845 NMR

Page 51: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 18

www. .uni-rostock.deMaps of hereditary information

1. Linkage maps of genesmini- / microsatellites

2. Banding patterns of chromosomesphysical objects with visible landmarks called banding patterns

3. DNA sequencesContig maps (contigous clone maps)Sequence tagged site (STS)SNPs (Single nucloetide polymorphisms)

Following maps are used to find out how hereditary information is stored, passed on, and implemented.

Page 52: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 19

www. .uni-rostock.de

Linkage map

Page 53: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 20

www. .uni-rostock.deMaps of hereditary information

• regions, 8-80bp long, repeated a variable number of times• the distribution and the size of repeats is the marker• inheritance of VNTRs can be followed in a family and

mapped to a pathological phenotype• first genetic data used for personal identification

– Genetic fingerprints; in paternity and in criminal cases

Variable number tandem repeats (VNTRs, also minisatellites)

Short tandem repeat polymorphism (STRPs, also microsatellites)

• Regions of 2-7bp, repeated many times– Usually 10-30 consecutive copies

Page 54: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 21

www. .uni-rostock.de

centromere

CGTCGTCGTCGTCGTCGTCGTCGT...GCAGCAGCAGCAGCAGCAGCAGCA...

3bp

Page 55: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 22

www. .uni-rostock.deMaps of hereditary information

Banding patterns of chromosomes

Page 56: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 23

www. .uni-rostock.deMaps of hereditary information

Banding patterns of chromosomes

peti te – armcentromere

queue - arm

Page 57: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 24

www. .uni-rostock.deMaps of hereditary information

• Series of overlapping DNA clones of known order along a chromosome from an organism of interest, stored in yeast or bacterial cells as YACs (Yeast Artificial Chromosomes) or BACs (Bacterial Artificial Chromosomes)

• A contig map produces a fine mapping (high resolution) of a genome

• YAC can contain up to 106bp, a BAC about 250.000bp

Contig map (also contiguous clone map)

Sequence tagged site (STS)

• Short, sequenced region of DNA, 200-600bp long, that appears in a unique location in the genome

• One type arises from an EST (expressed sequence tag), a piece of cDNA

Page 58: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 25

www. .uni-rostock.deMaps of hereditary information

1. if we know the protein involved, we can pursue rational approaches to therapy

2. if we know the gene involved, we can devise tests to identify sufferers or carriers

3. wereas the knowledge of the chromosomal location of the gene is unnecessary in many cases for either therapy or detection; • it is required only for identifying the gene, providing a

bridge between the patterns of inheritance and the DNA sequence

Imagine we know that a disease results from a specific defective protein:

Page 59: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 26

www. .uni-rostock.deSingle nucleotide polymorphisms (SNPs)Single nucleotide polymorphisms (SNPs)

• SNP (pronounced ‘snip’) is a genetic variation between individuals

• single base pairs that can be substituted, deleted or inserted

• SNPs are distributed throughout the genome– average every 2000bp

• provide markers for mapping genes• not all SNPs are linked to diseases

Page 60: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 27

www. .uni-rostock.deSingle nucleotide polymorphisms (SNPs)

• nonsense mutations: – codes for a stop, which can truncate the

protein• missense mutations:

– codes for a different amino acid • silent mutations:

– codes for the same amino acid, so has no effect

Page 61: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 28

www. .uni-rostock.deOutlook – coming lecture

• Bioinformatics Information Resources And Networks– EMBnet – European Molecular Biology Network

• DBs and Tools– NCBI – National Center For Biotechnology Information

• DBs and Tools

– Nucleic Acid Sequence Databases – Protein Information Resources– Metabolic Databases– Mapping Databases– Databases concerning Mutations– Literature Databases

Page 62: proteomics and genomics-1

Ulf Schmitz, Introduction to genomics and proteomics II 29

www. .uni-rostock.de

Thanks for your attention!