Chapter 24 topics: Genomics, Proteomics, Bioinformatics

Chapter 24 topics:Genomics, Proteomics, Bioinformatics

Student learning outcomes:• Describe tools to obtain DNA sequences of genomes• Explain how microarrays analyze the transcriptome• Describe how proteomics studies proteins of cells• Define how bioinformatics manages vast stores of

DNA data

Figures: 1, 3-13, 16, 17, 19, 20, 23, 24, 27, 28, 30; Tables 1, 2, 3

Problems: 1, 2*, 3-7, 9,12*, 15, 17,18, 20*, 22, 23*, 24, AQ3*,424-1

24.1 Positional CloningPositional cloning: discover genes for genetic traits• Mapping studies to roughly locate gene of interest

to relatively small region of DNA on chromosome• Physical landmarks - relate to gene position:

• Restriction Fragment Length Polymorphisms (RFLP): lengths of restriction fragments from a specific enzyme vary among individuals

• CpG Islands: DNA with unmethylated CpG is often actively expressed; find with methylation-sensitive restriction enzymes (HpaII vs. MspI for CCGG)

Southern blots detect RFLPs

Fig. 1 People differ in presence of particular HindIII site

Classic example: Identifying Gene Mutated in Human Huntington’s Disease (HD)

• Dominant disease, late onset, degenerative• Used RFLPs with huge family groups having disesase,

Wexler, Gusella to map HD gene near end of chromosome 4

• Mutation causing disease is expansion of CAG repeat from normal range of 11-34 copies to abnormal range of > 38 copies (triplet expansion)

• Extra repeats -> extra Gln inserted into huntingtin, product of HD gene

• Huntingtin has normal role in brain: interferes with transcription factor SP1 binding TAF130

• Mouse knockout: heterozygotes have neuro problems; null are dead

RFLPs helped locate Huntington’s disease gene

Fig. 3 Combinations of RFLP distinguish 4 possible haplotypes

Fig. 4 Southern blot defines haplotype genotypes of members

Fig. 24.5 Haplotype C is associated with disease - predictive

HD gene identified from studies large families

Pedigree studies, molecular studies of haplotypes, and correlation with disease: lead to cloning of gene and prediction for disease (variable age onset)

24.2 Sequencing Genomes• Information from genome sequences:

– Location of exact coding regions for all genes– Spatial relationships among genes, exact distances

between them in bp– Sanger dideoxy sequencing 1977 (X174 phage)

• How is coding region recognized?– Contains an ORF long enough to code for protein– ORF (open reading frame) must

• Start with ATG triplet• End with stop codon

– Phage or bacterial ORF same as coding region– Eukaryotic ORF definition is more difficult: introns

Genome Results (Table 1 examples)

Numerous RNA or DNA sequences of genomes of viruses and organisms have been obtained:– Phages, viruses– Bacteria – Animals– Plants– Human, Neanderthal

Comparison of related genomes (close or distant) sheds light on evolution of species: phylogeny from combination of traditional and molecular data

Human Genome Project (3 x 109 bp haploid) A. Original plan systematic and conservative: (1990)

– Funded by NIH, Dept. of Energy– Prepare genetic, physical maps with markers: then piece

DNA sequences together in proper order– Plan most sequencing after mapping complete– [Also many model organisms sequenced to compare]

• Celera, a private, for-profit company (J.C. Venter) vowed to complete rough draft of genome by 2000

B. Celera method was shotgun sequencing:– Whole genome chopped up and cloned– Clones sequenced randomly– Sequences pieced together by computer programs

Vectors for Large-Scale Genome Projects

• Two high-capacity vectors for Human Genome Project– Mapping mostly used yeast artificial chromosome (YAC),

accepts million base pairs– Sequencing used bacterial artificial chromosomes (BAC)

accepts about 300,000 bp• BACs are more stable, easier to work with than YACs

Figs. 7, 8

BACYAC

A. Clone-by-Clone Strategy• Mapping requires set of physical landmarks to

relate positions of cloned genes, then sequence• Some markers are genes; many are nameless

stretches of DNA (must organize it all)– RFLPs – want polymorphic regions

• Ideally different pattern for people with disease vs. normal people locates disease genes (like HD)

– VNTRs, variable number tandem repeats of small seq.• Mini-satellite, Highly polymorphic, useful for forensics

– STSs, sequence-tagged unique sites, expressed-sequence tags and microsatellites

Sequence-Tagged Sites- physical maps

Fig. 9

• STSs unique sequences– 60-1000 bp long– Detectable by PCR

• Need sequence information for primers;

• Need not be in a gene• Design short primers

– Hybridize few hundred bp apart

– Amplify predictable length of DNA – see on gel

Sequence-Tagged Sites - Physical Maps

Fig. 10

Align cloned sequences to form contigs (contiguous overlapping DNA sequences)

Shotgun-Sequencing Method used by Celera

Fig. 11: Connect overlapping BAC clones by identification of STCs, sequence-tagged connectors

Human Genome Project

• Working draft (2001) reported by Venter (Celera) and NIH/DOE consortium:

• Estimated genome contained fewer genes than anticipated – 25,000 to 30,000

• 2007 completed version

• About half of genome from action of transposons• Bacteria also donated dozens of genes• Provides information about human evolution: chimpanzee, Neanderthal, many other genomes

Findings from Chromosome 22 – 1st one679 annotated genes:

– 274 Known genes, previously identified– 150 Related genes, homologous to known genes– 134 Pseudogenes, sequences homologous to known

genes, but defects preclude proper expression

Coding regions of genes only a tiny fraction• Annotated genes 39% of total length• Exons only 3% • Repeat sequences (Alu, LINEs, etc) are 41%

Large chunks of human chromosome 22q conserved in several different mouse chromosomes

Homologs• Orthologs: homologous genes in

different species evolved from common ancestor:– 8 regions to 7 mouse chromosomes

• Paralogs: homologous genes that evolved by gene duplication within a species

• Homologs: any kind of homologous genes, both orthologs and paralogs

Fig. 13 Large chunks of human chromosome 22q conserved in several different mouse chromosomes (113 genes)

Chromosome 21• Relative few genes

– 225 genes– 59 pseudogenes

• All 24 genes shared with mouse chromosome 10 are in same order in both chromosomes

• Disease genes associated with chromosome 21:– Down syndrome is extra chromosome– Alzheimer’s, ALS (Lou Gehrig’s disease) genes

The X Chromosome• Sequence of 151 Mb of human X chromosome:

- 1098 protein-encoding genes– 168 genes governing X-linked phenotype– Genes for 173 noncoding RNAs– Lot of genes identified for human disease (sex-linked)

• Chromosome rich in LINE1 repetitive elements– Involved in X inactivation mechanism in female cells

• XIST RNA (X-inactivation specific)– 32-kb RNA responsible for X-inactivation, heterochromatin

X (and partner Y) evolved from ancestral autosomes

Other Vertebrate Genomes

Fig. 14 Mouse, human

• Comparing human genome with other vertebrates:– helped identify many human genes– help identify defective genes for

human genetic diseases

• Closely related species (mouse) identify when and where genes are expressed; predict when and where human genes likely expressed

The Minimal Genome – J. Craig Venter• Define essential gene set of simple organism

– Mutate one gene at a time; see which required for life• In theory, could define minimal genome: set of

genes required for life– Minimum genome likely larger than essential gene set• Sequence a small genome, then delete genes Mycoplasma genitalium, 580 kb (480 protein-coding genes)• No cell wall, intracellular parasite, only glycolysis

• 2010 placed synthetic minimal genome (1 x 106 bp) into Mycoplasma cell lacking genes :– new life form that can live and reproduce under lab

conditions – controversial approaches

The Barcode of Life• CBOL (Consortium for the Barcode of Life: plan to

create barcode to identify any species of life on earth• First such barcode - sequence of 648-bp piece of

mitochondrial COI gene from each organism– Cytochrome C oxidase– Isolate mitochondrial DNA, sequence

• Sequence can uniquely identify most organisms

• Other sequences needed for plants and bacteria, since less variation among their COI genes

24.3 Applications : Functional Genomics

• Functional genomics deals with function or expression of genomes

• Transcriptome: all transcripts an organism makes at any given time

• Genomic functional profiling: use of genomic information to block expression systematically

• Proteomics: study structures and functions of protein products of genomes

Transcriptomics• Study all transcripts organism makes• Create DNA microarrays (microchips) that hold

1000s of cDNAs or oligos – Hybridize labeled RNAs (cDNAs) from cells to chips– Intensity of hybridization at each spot reveals the extent

of expression of corresponding gene• Arrays measure expression of many genes at once• Clustered expression of genes in time and space

suggests products of these genes collaborate in some process -> function

• Affymetrix makes chips, 25-mer unique sequences

DNA chips: Oligo-nucleotides on a Glass Substrate

Fig. 16

Fig. 17

Serum-starved human cells cDNA (labeled green); serum-fed cells cDNA (red)Equal expression of mRNA = yellow

Genomic Functional Profiling

– Deletion analysis - mutants created by replacing genes with antibiotic resistance gene flanked by oligomers serving as barcode for that mutant

– Functional profile can be obtained by growing whole group of mutants together under various conditions to see which mutants disappear most rapidly

Fig. 21 Growth of yeast mutants on galactose C source

RNAi Analysis• Genomic functional analysis: RNAi inactivates genes• Ex. genes involved in early embryogenesis in C. elegans:

– 661 important genes (early embryo defect)– 326 involved in embryogenesis

Fig. 22: initial screen showed which genes were mutated with RNAi;Then see which stage of embryogenesis affected

* Locating Target Sites for Transcription Factors (ChIP-chip)

• Chromatin immunoprecipitation (ChIP) followed by DNA microarray analysis can identify DNA-binding sites for activators and other proteins

• Small genome organisms - all intergenic regions can be included in microarray

• If genome is large, not practical• To narrow areas of interest, can use CpG islands

– Non-methylated CpG associated with gene control region– If timing/conditions of activator’s activity are known, control

regions of genes known to be activated at those times, or under those conditions, can be used

ChIP-chip assays locate target sites for specific transcription factors

Fig. 24

• ChIP with specific antibody• PCR adding generic primer to all• fluorescent label • microarray

•See Fig. 25 Yeast Gal4 protein binding sites

In Situ Expression Analysis

‘Mouse blots’

• Mouse as human surrogate in large-scale expression studies (ethically impossible in humans)

• Studied expression of almost all mouse orthologs of genes on human chromosome 21– Followed stages of embryonic development (E)– Catalogued embryonic tissues in which genes expressed

Fig. 26

Single-Nucleotide Polymorphisms; pharmacogenomics

• Single-nucleotide polymorphisms (SNPs) are single bp differences between people; account for many genetic conditions caused by single genes, even multiple genes

• Might be able to predict response to a drug• New focus for therapeutics

• Haplotype map with > 1 million SNPs: sort out important SNPs from those with no effect

24.4 Proteomics• Proteome: all proteins produced by an organism• Proteomics: Study of all proteins, or subsets• More accurate picture of gene expression than

transcriptomics studies:– Sometimes mRNA is degraded, not translated

• First separate proteins, often on massive scale– 2-D gel electrophoresis is good tool

• After separation, identify proteins– Digest proteins with proteases– Identify peptides by mass spectrometry

MALDI-TOF Mass Spectrometry

Fig. 27

Matrix-assisted laser desorption ionization – time of flightPeptides ionized; time to reach detector accurately reflects mass

Detecting Protein-Protein Interactions

Fig. 28

Epitope tag on one protein (from gene level) permits isolation of complex containing that protein using affinity resins

Common epitopes: His6-tag, HA- tagFlag-tag, TAP-tag

** In future, microchips with antibodies may allow analysis of proteins in complex mixtures without separation

Identifying Protein Interactions, networks• Most proteins function with other proteins• Yeast two-hybrid analysis• Protein microarrays• Immunoaffinity chromatography with mass spectrometry

Fig. 29. Identifying proteins binding kinases using Flag-tagged KssI or Cdc28

24.5 Bioinformatics• Bioinformatics: building and using biological databases

– DNA sequences of genomes– mining massive amounts of biological data for meaningful

knowledge about gene structure and expression

• National Center for Biological Information (NCBI) website: vast store of biological information (genomic and proteomic)

• Start with DNA sequence, discover gene, then compare that sequence with that of similar genes or organisms

• View 3D protein structures on computer

Review questions

2. What kind of mutation gave rise to Huntington disease?

12. Compare/ contrast the clone-by-clone sequencing strategy with the shotgun sequencing strategy for large genomes

15. The pufferfish genome is nine times smaller than human genome, but contains as many genes. How can that be?

20. Describe hypothetical experiment using DNA microarray to measure transcription from SV40 viral genes at two stages of infection of cells by the virus. Show example results.

Chapter 24 topics: Genomics, Proteomics, Bioinformatics

Documents

Proteomics & Bioinformatics Part I

Proteomics & Bioinformatics Part I - Gene-Quantification · Proteomics Bioinformatics 3 Kinds of Proteomics • Structural Proteomics – High throughput X-ray Crystallography/Modelling

Genomics to proteomics

Chapter 1 Functional Genomics, Proteomics, … 1 Functional Genomics, Proteomics, Metabolomics and Bioinformatics for Systems Biology Stéphane Ballereau, Enrico Glaab, Alexei Kolodkin,

Bioinformatics Applied to Proteomics

Molecular Biology Fifth Edition Chapter 25 Genomics II: Functional Genomics, Proteomics, and Bioinformatics Lecture PowerPoint to accompany Robert F. Weaver

Genomics and proteomics I

10/10/06 Evolution/Phylogeny Bioinformatics Course Computational Genomics & Proteomics (CGP)

Computational Genomics and Proteomics

Proteomics & Bioinformatics Part II - Gene-Quantification · Proteomics & Bioinformatics Part II David Wishart University of Alberta 3 Kinds of Proteomics • Structural Proteomics

Proteomics myriad Bioinformatics Industrial Applications in High Throughput Proteomics Bioinformatics Industrial Applications in High Throughput Proteomics

proteomics and genomics-1

Genomics to Proteomics

Genomics and proteomics II

Genomics Metagenimics Proteomics Atfe

Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

tics Genomics, And Proteomics

Maria Y. Giovanni, Ph...genomics, function genomics, proteomics, structural genomics, bioinformatics and other ‘omics’ resources to the scientific community for basic and applied

Bioinformatics, Genomics, Proteomicsdl.booktolearn.com/ebooks2/science/...genomics_and_proteomics_3… · Bioinformatics, genomics, and proteomics: getting the big picture/Ann Batiza

Discovering Genomics, Proteomics, And Bioinformatics a Malcolm Campbell Laurie J Heyer 2003