View
230
Download
0
Embed Size (px)
Citation preview
Genomics & ProteomicsGenomics & Proteomics
I.I. What is genomics?What is genomics?
A.A. GOALS of GenomicsGOALS of Genomics
B.B. How are genomes mapped?How are genomes mapped?
C.C. Methods for sequencing genomesMethods for sequencing genomes
D.D. The Human Genome ProjectThe Human Genome Project
II.II. Finding answers w/advanced technology!Finding answers w/advanced technology!
A.A. BioinformaticsBioinformatics
B.B. Comparative genomics & model organismsComparative genomics & model organisms
C.C. DNA chips aka DNA chips aka microarraysmicroarrays
III.III. ProteomicsProteomics
Genome = the total genetic composition of an organism.Genome = the total genetic composition of an organism.Genomics = the molecular analysis of the entire genome Genomics = the molecular analysis of the entire genome of a species.of a species.
Genome mappingGenome mapping = Segments of chromosomes are cloned and = Segments of chromosomes are cloned and analyzed in progressively smaller pieces, the locations of which analyzed in progressively smaller pieces, the locations of which are known on the intact chromosomesare known on the intact chromosomes
Ultimately leads to the determination of the complete DNA Ultimately leads to the determination of the complete DNA sequence, and understanding of that sequencesequence, and understanding of that sequence
Genomes sequenced so far: 800+ organismsGenomes sequenced so far: 800+ organisms
http://www.ncbi.nlm.nih.gov/sites/entrez?db=genhttp://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprjomeprj 3181 sequenced or in progress… 3181 sequenced or in progress…
I. What is genomics?
Hemophilus influenzae – first bacterial genome sequenced in 1995. Composed of 1.83 bp circular chromosome, 1,743 genes.
A. GOALS of GenomicsA. GOALS of Genomics1)1) Compile the genomic sequences of Compile the genomic sequences of
organismsorganisms2)2) Establish the Establish the locationlocation of all genes and of all genes and
annotate the gene set in a genomeannotate the gene set in a genome Find ORFs (start codon – stop codon)Find ORFs (start codon – stop codon) Tells us the spatial relationships among Tells us the spatial relationships among
genesgenes
3)3) Establish the Establish the functionfunction of all genes of all genes4)4) Generate Generate gene expression profilesgene expression profiles for for
cells under differing conditionscells under differing conditions5)5) CompareCompare genes and proteins between genes and proteins between
organisms to establish organisms to establish evolutionary evolutionary relationshipsrelationships
Genomics has 3 subfields:Genomics has 3 subfields:
1)1) Structural genomicsStructural genomics = genetic mapping, = genetic mapping, physical mapping, and sequencing of entire physical mapping, and sequencing of entire genomesgenomes
2)2) Functional genomicsFunctional genomics = comprehensive analysis = comprehensive analysis of the functions of genes and nongene of the functions of genes and nongene sequences in genomessequences in genomes
3)3) Comparative genomicsComparative genomics = comparison of = comparison of genomes of different species to determine the genomes of different species to determine the function of each genome and understand function of each genome and understand evolutionary relationshipsevolutionary relationships
B. How are genomes mapped?B. How are genomes mapped?1)1) Cytogenetic mappingCytogenetic mapping
FISH
2)2) Linkage mappingLinkage mappinga)a) Testcross & Pedigree analysisTestcross & Pedigree analysisb)b) Molecular markers = segment of DNA found at a Molecular markers = segment of DNA found at a
specific site along a chromosome, can be specific site along a chromosome, can be recognized using molecular tools, provides higher recognized using molecular tools, provides higher resolution.resolution.
RFLPs (restriction fragment length polymorphisms)RFLPs (restriction fragment length polymorphisms) VNTR (variable number of tandem repeats)VNTR (variable number of tandem repeats) STR (short tandom repeats)STR (short tandom repeats) SNP (single nucleotide polymorphism)SNP (single nucleotide polymorphism)
The distance between two linked markers can be The distance between two linked markers can be determined by making crosses and analyzing the determined by making crosses and analyzing the offspring (parentals v. nonparentals)offspring (parentals v. nonparentals)
3)3) Physical mapping Physical mapping
Physical map of a chromosome is constructed by Physical map of a chromosome is constructed by creating a contiguous series of overlapping clones from creating a contiguous series of overlapping clones from a chromosome-specific librarya chromosome-specific libraryContig = collection of clones, found as overlapping Contig = collection of clones, found as overlapping regions within a group of vectorsregions within a group of vectors
The contigs are arranged relative to each other by comparing The contigs are arranged relative to each other by comparing restriction maps & DNA markersrestriction maps & DNA markers
(YACs, BACs) can carry large genomic inserts(YACs, BACs) can carry large genomic inserts Genomic library generated, clone individually isolated and Genomic library generated, clone individually isolated and
arranged to a grid patternarranged to a grid pattern Identify adjacent members that contain overlapping regionsIdentify adjacent members that contain overlapping regions Southern blottingSouthern blotting Molecular markersMolecular markers
YAC
A comparison of linkage, cytogenetic & physical maps
C. Methods for sequencing C. Methods for sequencing genomesgenomes
1.1. Clone by clone method (“top-down”)Clone by clone method (“top-down”) Construction of genomic libraries of fragments Construction of genomic libraries of fragments
covering the total DNA of an organism. Then using covering the total DNA of an organism. Then using genetic markers, overlapping clones are assembled to genetic markers, overlapping clones are assembled to establish maps that encompass the entire genomeestablish maps that encompass the entire genome
2.2. Shotgun methodShotgun method Genomic libraries are prepared and randomly selected Genomic libraries are prepared and randomly selected
clones are sequenced until all clones in library are clones are sequenced until all clones in library are analyzed. Software packages organize the nucleotide analyzed. Software packages organize the nucleotide sequences.sequences.
overlapping clones are overlapping clones are assembled to establish assembled to establish mapsmaps
libraries are prepared and libraries are prepared and randomly selected clones randomly selected clones are sequencedare sequenced
Compiling the sequence = genome sequenced multiple times to ensure the sequence is accurate
D. The Human Genome Project D. The Human Genome Project International research effort to characterize the genome of International research effort to characterize the genome of
the human, GOALS:the human, GOALS:
Complete mapping and sequencing of the DNAComplete mapping and sequencing of the DNA Identify all the genes & store this information in a public databaseIdentify all the genes & store this information in a public database
Develop technologies for genome analysis & transfer these Develop technologies for genome analysis & transfer these tools to the private sectortools to the private sector
Examine ethical, legal and social implications of human Examine ethical, legal and social implications of human genetics researchgenetics research
In U.S., Funded by NSF, global cooperative effort (GB, In U.S., Funded by NSF, global cooperative effort (GB, France, and Japan) – additionally, Celera (Craig Ventor) France, and Japan) – additionally, Celera (Craig Ventor) was involved: was involved: http://www.tigr.org/http://www.tigr.org/
Human Genome ProjectHuman Genome Project Began in 1990… as of 2003, 99% of the Began in 1990… as of 2003, 99% of the
euchromatic region was published: October euchromatic region was published: October
issue of issue of NatureNature 2.85 billion nucleotides 2.85 billion nucleotides ~45% of the genome consists of repetitive DNA~45% of the genome consists of repetitive DNA At least 50% is derived from transposable At least 50% is derived from transposable
elementselements < 5% protein coding genes< 5% protein coding genes 20,500 known protein-coding genes and a further 20,500 known protein-coding genes and a further
4,000 additional sections of DNA predicted to be 4,000 additional sections of DNA predicted to be putative protein-coding genes putative protein-coding genes
Chromosome 19 has the highest gene density, Chromosome 19 has the highest gene density, 55.8 million bases ~1500 genes; the Y 55.8 million bases ~1500 genes; the Y chromosome the lowest (78 genes)chromosome the lowest (78 genes)
http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606
II. II. Finding answers w/advanced technology!Finding answers w/advanced technology!
A. A. Bioinformatics = Annotating genomesBioinformatics = Annotating genomesEmerging field concerned with the development andEmerging field concerned with the development andapplication of computer softwareapplication of computer software to the acquisition, to the acquisition,storage, analysis and visualization of biologicalstorage, analysis and visualization of biologicalinformationinformation identifying protein coding genes (ORFs = open reading identifying protein coding genes (ORFs = open reading
frames)frames) identifying non-protein coding regionsidentifying non-protein coding regions Characterizing Mobile elementsCharacterizing Mobile elements Analyzing homologous regions – comparative genomicsAnalyzing homologous regions – comparative genomics BLASTBLAST = basic local alignment search tool, computer = basic local alignment search tool, computer
program that allows you to start with a DNA or protein program that allows you to start with a DNA or protein sequence and then locate homologous sequences in a sequence and then locate homologous sequences in a huge database.huge database.
Annotation = identifying genes, their regulatory sequences, their function + identifying non-protein coding regions
B. Comparative genomics & model B. Comparative genomics & model organismsorganisms
So far, confirmed a common So far, confirmed a common ancestor, similar gene sets for ancestor, similar gene sets for basic cellular functionsbasic cellular functionsCan study inherited disorders, Can study inherited disorders, gene interactions and gene interactions and environmentenvironmentTo date: Yeast, To date: Yeast, DrosophilaDrosophila, , C. C. eleganselegans, mouse, dog, mouse, dog
OrthologousOrthologous = descended from a = descended from a common anscestral gene, have same common anscestral gene, have same functionfunction
240 between 240 between M. genitaliumM. genitalium and and H. H. influenzaeinfluenzae
Paralogous – Paralogous – arise from a gene duplication event
w/ dog, humans share 400 single w/ dog, humans share 400 single gene disorders, sex chromosome gene disorders, sex chromosome anueploidies, multifactorial diseasesanueploidies, multifactorial diseases
Dogs share many genetic disorders w/humans
Pan troglodytes & Homo sapiensPan troglodytes & Homo sapiens
Last common ancestor = 6mya, we’ve Last common ancestor = 6mya, we’ve been diverging ever since!been diverging ever since!Chromosome 22 – 24, only a 1.44% Chromosome 22 – 24, only a 1.44% difference, however there are 68,000 difference, however there are 68,000 indelsindelsTissue expression patterns differed – Tissue expression patterns differed – several genes found to be expressed in several genes found to be expressed in either one or the other, but not botheither one or the other, but not bothpatterns of evolution in human and patterns of evolution in human and chimpanzee protein-coding genes are chimpanzee protein-coding genes are highly correlated and dominated by the highly correlated and dominated by the fixation of neutral and slightly fixation of neutral and slightly deleterious alleles deleterious alleles
C. DNA chips aka C. DNA chips aka microarraysmicroarraysAnalyzing genome wide expression patterns in different tissues
studying genome-wide patterns studying genome-wide patterns of gene expressionof gene expressionCan view cells as an array of expressed genes
DNA complementary to genes of DNA complementary to genes of interest are generated and laid interest are generated and laid out in microscopic quantities on out in microscopic quantities on a slidea slide
Sample cDNA binds to Sample cDNA binds to complement, presence of bound complement, presence of bound DNA is detected by fluorescenceDNA is detected by fluorescence
http://www.dnalc.org/ddnalc/resources/dnaarray.html
Functional genomicsFunctional genomics seeks to understand the function of genes and how they seeks to understand the function of genes and how they determine phenotypes.determine phenotypes.
identify the genes that are active within a cell and help identify mutated genes
http://www.dnalc.org/ddnalc/resources/dnachip.html
A hand-held DNA Chip device, (Nanogen, Inc). The circles at the top are sample ports. The wires guide electric fields over the DNA array, located on the light blue diamond.
III. ProteomicsIII. Proteomics
Genomics 1Genomics 1stst step… step…Proteomics = the cataloging Proteomics = the cataloging and analysis of a the and analysis of a the proteomeproteome (a complete set of (a complete set of expressed proteins in a cell at expressed proteins in a cell at a particular time) to determine a particular time) to determine when a protein is expressed, when a protein is expressed, how much is made, and with how much is made, and with what other proteins the protein what other proteins the protein can interact.can interact.
Goals: to identify every protein Goals: to identify every protein in the proteome, to determine in the proteome, to determine the sequences of each protein, the sequences of each protein, and to analyze globally protein and to analyze globally protein levels in different cell types and levels in different cell types and at different stages in at different stages in development.development.
Dot matrix can compare the degree of similarity between two primary sequences. Regions of homology are recognized, gaps can be inserted to align sequences.
Too simple for very long sequences, dynamic programming methods used instead: Multiple sequence alignment.
Proteomics, big challenge!Proteomics, big challenge!
Proteome – larger than the genome, Proteome – larger than the genome, Changes in pre-mRNA structure may Changes in pre-mRNA structure may affect the primary sequence of a proteinaffect the primary sequence of a protein
Alternative splicingAlternative splicing RNA editingRNA editing
Post-translational modificationsPost-translational modificationsMethods: Methods:
Two-dimensional gel electrophoresis, used to Two-dimensional gel electrophoresis, used to separate cellular proteinsseparate cellular proteins
Mass spectrometry – used to identify proteinsMass spectrometry – used to identify proteins Protein microarrays, protein expression & Protein microarrays, protein expression &
functionfunction