68
Chapter 7 DNA Sequencing and the Evolution of the Omics

Chapter 7 DNA Sequencing the Evolution Omics

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 7 DNA Sequencing the Evolution Omics

Chapter 7

DNA Sequencing and the Evolution of the

�Omics�

Page 2: Chapter 7 DNA Sequencing the Evolution Omics

Sequencing Important component of many projects: there are several �generations� of sequencing methods

Data produced –> extensive databases

Two methods were developed in 1977 - First generation methods: 1) Sanger 2) Maxam and Gilbert

Both use same basic approaches

Sanger still used, but relatively expensive compared to second- and third-generation methods

Each generation type has pros and cons

Page 3: Chapter 7 DNA Sequencing the Evolution Omics

First-Generation Sequencing

Computer tools needed to analyze data

DNA and RNA can be sequenced cDNA –> less information WHY?

Length that can be sequenced in a single first-generation method (Sanger) reaction varies from 200 to ~ 1000 bases

Vectors can hold different insert sizes

Page 4: Chapter 7 DNA Sequencing the Evolution Omics

First-Generation Sequencing

Yeast artificial chromosomes = up to 1 million bp inserts

Cosmids can contain 30,000 to 45,000 bp inserts (30-40 kb)

Cloned DNA thus must usually be subcloned for Sanger sequencing

Page 5: Chapter 7 DNA Sequencing the Evolution Omics

Dideoxy or Chain-terminating Method

Developed by Sanger

Involves de novo synthesis of labeled DNA fragments from a ss DNA template

Denature ds DNA by heating Clone it into a vector that produces ss DNA

SS DNA serves as template for synthesis of a new labeled DNA strand

Page 6: Chapter 7 DNA Sequencing the Evolution Omics

Dideoxy or Chain - terminating Method

Labeling originally by 32P or 35S

Because DNA synthesis is involved, DNA polymerase, labeled dNTPs, a primer and ddNTPs are needed in Sanger sequencing

Structure of labeled dNTP with 35S

Page 7: Chapter 7 DNA Sequencing the Evolution Omics

Dideoxy or Chain - terminating Method Sanger Sequencing

Page 8: Chapter 7 DNA Sequencing the Evolution Omics

Dideoxy or Chain - terminating Method

Fluorescence labeling is now used and reads do not involve electrophoresis.

Acrylamide gel photo of radiolabeled DNA

Page 9: Chapter 7 DNA Sequencing the Evolution Omics

Maxam - Gilbert Method: Another First-generation method

Uses chemical reagents to generate base - specific cleavages of DNA

Less used today because chemicals are toxic and method is labor intensive

Advantage: sequences are obtained from original DNA molecule (not synthesized copy)

Page 10: Chapter 7 DNA Sequencing the Evolution Omics

Maxam - Gilbert Method

Need: pure DNA cut by restriction endonucleases –> fragments of a specific length and with known sequences at one end

Each fragment is then labeled at one end with a 32 P - phosphate group so that 4 reactions can be carried out

Page 11: Chapter 7 DNA Sequencing the Evolution Omics

Maxam - Gilbert Method

Result is a set of end - labeled fragments of different lengths that show up as a ladder of bands on a gel

Each reaction is limited so not all Gs (or Ts, Cs, or As) are modified in a reaction

The 4 different reaction samples were run side by side on a sequencing gel and visualized by autoradiography

Page 12: Chapter 7 DNA Sequencing the Evolution Omics

Maxam and Gilbert Chemical Cleavage Method

Page 13: Chapter 7 DNA Sequencing the Evolution Omics

Shotgun Strategy DNA is digested with a restriction enzyme and

subfragments are cloned and sequenced

A computer is used to determine how fragments (CONTIGS) overlap –> original sequence

Shotgun method may under represent some fragments, sequencing must be redundant

Page 14: Chapter 7 DNA Sequencing the Evolution Omics

Sequencing Strategies Directed

Random

Page 15: Chapter 7 DNA Sequencing the Evolution Omics

Sequencing by the PCR

PCR allows a specific segment of DNA to be amplified a million - fold or more

Fragments can be amplified from genomic DNA or RNA (cDNA)

REQUIRES primers to flank region to be amplified Several methods developed to sequence PCR products

Page 16: Chapter 7 DNA Sequencing the Evolution Omics

Automated DNA Sequencers

Fluorescent labeling made early large - scale genome projects feasible (gels no longer used, radiolabeling no longer used)

DNA isolation, cloning, PCR, prepare sequencing reactions, purify DNA, separate and detect labeled DNA fragments

Many large - scale facilities used random shotgun phase and a directed finishing phase to complete genome analysis

Page 17: Chapter 7 DNA Sequencing the Evolution Omics

Automated DNA Sequencers

Automation has reduced costs of Sanger sequencing: $0.20 to 0.30 per base if accuracy is less than one error in 10,000 bases

During 2000, Drosophila Genome Project was completed, using Sanger sequencing, as well as the Human Genome Project

Page 18: Chapter 7 DNA Sequencing the Evolution Omics

Sequencing Data

Huge amounts of data

Computers are essential for assembly of sequences and analysis

Sequences can be searched to discover

tandem repeats and inverted repeats open reading frames (ORFs) similarities with other DNA sequences in databases introns and exons transposable elements

Page 19: Chapter 7 DNA Sequencing the Evolution Omics

Homology

Controversial term: multiple definitions

Usually assumes organisms have descended from a common ancestor

Percent homology NOT correct unless descent is involved

Better to use percentage SIMILARITY

Page 20: Chapter 7 DNA Sequencing the Evolution Omics

DNA Sequence Data Banks

DNA Data Bank of Japan (DDBJ)

European Molecular Biology Laboratory Nucleotide Sequence Data Library (EMBL)

GenBank Genetic Sequence Data Bank at NCBI

Database subsets: mt, promoters, proteins, genomes, introns, restriction endonucleases etc.

Page 21: Chapter 7 DNA Sequencing the Evolution Omics

Drosophila Genome Project

Part of the Human Genome Project

Both controversial: 1st �big� biology project Time Resources (individuals vs. teams) Necessary ? (substantial amount already

known about D. melanogaster�s genome)

Page 22: Chapter 7 DNA Sequencing the Evolution Omics

Drosophila Genome Project

Drosophila Genome 180 Mb, 4 pr chromosomes, X, Y and 3 pr autosomes One - third is heterochromatin

Cytogenetic map available (5000 bands) Approximately 3800 genes already mapped

In situ hybridization of salivary gland chromosomes –> 3000 transcription units

1300 genes already cloned & sequenced

Page 23: Chapter 7 DNA Sequencing the Evolution Omics

Drosophila Polytene Chromosomes

Light micrograph of stained salivary gland chromosomes: X, 2, 3 and 4 are joined at centromeres (within circle)

Page 24: Chapter 7 DNA Sequencing the Evolution Omics

Cytogenetic Map on Polytene X

Fine Map of Drosophila salivary gland X chromosome

Page 25: Chapter 7 DNA Sequencing the Evolution Omics

Original Drosophila Genome Project

Several steps Physical map as a basis for sequencing and detailed functional

studies: overlapping clones for which info is available on sequences at ends and location on chromosomes

Feasibility studies for large - scale sequencing, focusing on regions of great biological interest (3 megabases of contiguous sequences within 3 yrs)

Develop bioinformatic techniques to identify coding sequences and analyze data

Page 26: Chapter 7 DNA Sequencing the Evolution Omics

ACTUAL Project

Completed more quickly and by a different strategy

D. melanogaster second multicellular organism after C. elegans to have complete genome sequenced

Initial project initiated in 1990, and only partly completed in 1996 when Venter et al. proposed using the �shotgun� method

Shotgun cloning had never been attempted with such a complex genome: Venter et al. used MASSIVE Sanger sequencing methods

Page 27: Chapter 7 DNA Sequencing the Evolution Omics

Celera Sequencing and Analysis

Page 28: Chapter 7 DNA Sequencing the Evolution Omics

ACTUAL Drosophila Genome Project

Began in May 1999 at Celera: completed by late fall of 1999 !!!!!

This created controversy and discord between Venter and the Drosophila genome consortium

Published in Science in March 2000

Major milestone for insect molecular genetics

Joint endeavor with Berkeley Drosophila project

Page 29: Chapter 7 DNA Sequencing the Evolution Omics

Drosophila Genome Project

Sequencing only first step: what are the sequences (coding, noncoding, introns, exons, centromeres, telomeres) and what do the �genes� do?

Genome analysis called �annotating�, uses different methods

Accuracy of different methods assessed by GASP, Genome Annotation Assessment Project, using a well studied region

Page 30: Chapter 7 DNA Sequencing the Evolution Omics

GASP

Coding regions better identified (95% success)

Correct intron / exon structures (40%)

Half of genes recognized and assigned functions by homology with known genes

Promoter sequences highly inaccurate < 1/3 correctly identified

Annotation methods have improved since then

Page 31: Chapter 7 DNA Sequencing the Evolution Omics

Surprises

13,600 genes identified initially

Fewer than in C. elegans

Expected number : 30,000

Overlapping genes may lead to higher count Many genes left to be studied (despite previous

efforts)

Page 32: Chapter 7 DNA Sequencing the Evolution Omics

Surprises Drosophila surprisingly relevant to study of

genes and pathways in tumor formation and development in humans

At least 76 Drosophila genes homologous to mammalian cancer genes

Furthermore, 178 (62%) of known human

disease genes appear conserved including genes causing Alzheimer�s disease, Huntington�s, Duchenne muscular dystrophy, Parkinson�s

Page 33: Chapter 7 DNA Sequencing the Evolution Omics

Bioinformatics Current analysis methods are not completely

accurate in identifying structural and functional genes

What is a gene? Sequences encoding a protein, encoding RNA, producing a phenotype?

Computer skills are improving

Isochores –> 300-kb segments that are homogeneous on basis of GC frequencies usually are rich in genes

Page 34: Chapter 7 DNA Sequencing the Evolution Omics

Bioinformatics Splice sites and junctions (introns and exons) can be

difficult to identify

Start and stop codons can be useful in predicting exons but reading frame must be known

As more genes are identified in other organisms, homologous genes can be found in Drosophila and other insects It is getting easier However, it assumes that similar sequences = similar function, which MAY NOT be true

Page 35: Chapter 7 DNA Sequencing the Evolution Omics

Next-Generation Methods Newer sequencing methods have revolutionized

genetics

Second-generation or Next-generation methods were developed Allow high-throughput Less expensive than Sanger sequencing

Limitations: sequenced produced are shorter than Sanger sequencing so ASSEMBLY can be more difficult

Page 36: Chapter 7 DNA Sequencing the Evolution Omics

Next-Generation Methods Platform Sequencing by synthesis Read length

Roche 454 Pyrosequencing 500-100 bp

Illumina (Solexa) Reversible terminators 20-40 bp

SOLiD Ligase 35 bp

Polonator Ligase 13 bp

HeliScope Polymerase 30 bp

Page 37: Chapter 7 DNA Sequencing the Evolution Omics

Next-Generation Methods Roche (454) was first NextGen platform

Called pyrosequencing because it involves incubating DNA-bearing beads with Bacillus stearothermophilus DNA polymerase, ss binding protein, ATP and luciferase

When incorporation of a nucleotide occurs pyrophosphate is released, resulting in a burst of light detectable by recording device

This method cannot sequence long strings of same bases

Cost: 1000 bp for $0.05

Page 38: Chapter 7 DNA Sequencing the Evolution Omics

Next-Generation Methods Illumina (Solexa) uses bridge PCR to amplify

sequences Fluorescent labels identify each nucleotide

Read lengths ca. 20-100 nt

Cost: 1000 bp for $0.002

Page 39: Chapter 7 DNA Sequencing the Evolution Omics

Next-Generation Methods Applied Biosystems SOLiD sequencer

Results in reads of ca. 35 bp

Read lengths ca 20-100 nt

Cost: 1000 bp for $0.002 [If you want to learn details of the biochemistry of these

methods, go to the company websites]

Page 40: Chapter 7 DNA Sequencing the Evolution Omics

Third-Generation Methods

Sequencing methods continue to be developed so that sequencing is faster, cheaper and provides longer reads

The ultimate goal is to produce the $1000 genome for

humans (which will provide less expensive insect genomes, as well)

Several methods: Ion Torrent, Single Molecule Real Time

Sequencer, Nanopore Sequencing

Page 41: Chapter 7 DNA Sequencing the Evolution Omics

Bioinformatics Methods of analyzing sequenced genomes continue to

improve

Genome annotation is challenging: The $1000 genome can result in the $100,000 analysis (or more)

With shorter read lengths, assembly (ordering reads into scaffolds of longer and longer length) is more difficult

Many people sequencing arthropod genomes have limited bioinformatics training, but a �point-and click� process is not yet available for such novices

Page 42: Chapter 7 DNA Sequencing the Evolution Omics

Bioinformatics The amount of data being produced is stressing the system

A single sequencing run can produce as much data as did entire genome centers a few years ago

Gene ontology (GO) is a bioinformatics project with the goal of standardizing the representation of gene and gene-product attributes across species and databases

The goal is to provide a specific vocabulary of terms for genes and gene products: Organized into Molecular function, Biological process, Cellular component

Page 43: Chapter 7 DNA Sequencing the Evolution Omics

Bioinformatics Gene Ontology: the model organisms (human, yeast and

Drosophila) have been annotated using GO terms

This has become a standard for new genomes

HOWEVER, until FUNCTION of a �gene� is documented with functional analysis, an annotated gene is SIMILAR but this does not necessarily equate to the same function

Evolution can modify gene functions

Page 44: Chapter 7 DNA Sequencing the Evolution Omics

Other Insect Genomes Anopheles gambiae, vector of malaria, had its

genome sequenced (2002)

Cost ca. $10 million using Sanger sequencing

Other genomes completed include Acromyrmex echinatior, Acrythosiphon pisum, Aedes aegypti, Apis mellifera, Bombyx mori, Culex quinquefasciatus, Danaus plexippus, Glossina species, Heliconius melpomene, Ixodes scapularis, Metaseiulus occidentalis, Nasonia species, Pediculus humanus, Pogonomyrmex barbatus, Rhodnius prolixis, Tetranychus urticae, Tribolium castaneum, Varroa

Page 45: Chapter 7 DNA Sequencing the Evolution Omics

Other Insect Genomes This list is incomplete and will continue to increase Most were sequenced using Next-Gen sequencing

methods

Project to sequence 5000 arthropod genomes is underway (i5K)

Should result in a transformation of our understanding of insect biology and evolution, as well as to manage pests

Page 46: Chapter 7 DNA Sequencing the Evolution Omics

What have we learned? Three mosquitoes sequenced: genome sizes vary,

details found in VectorBase Culex quinquefasciatus has expansion of olfactory and

gustatory receptors, salivary gland genes and detoxification genes

Bombyx mori has 1874 genes related to silk production Apis mellifera has few TEs and fewer genes for innate

immunity and detoxification enzymes: related to social behavior?

Page 47: Chapter 7 DNA Sequencing the Evolution Omics

What have we learned? Tetranychus urticae feeds on > 250 plants and produces

silk. It has a small genome (90 Mbp), but many detoxification gene families – perhaps associated with broad host range

Carotenoid biosynthesis genes in genome were horizontally transferred from fungi to the mite, as it was in the pea aphid (below)

Acyrthosiphon pisum has many gene duplications and lost some genes; duplications include sugar-transporter proteins, amino-acid transport, antiapoptosis genes, perception of smell, and olfactory behavior

Gene losses include defense response, immune response, taste perception, antimicrobial responses

Page 48: Chapter 7 DNA Sequencing the Evolution Omics

What have we learned? Danaus plexippus has a full repertoire of the circadian clock

and expanded chemoreceptors, genes for defense against cardenolide glycosides

Heliconoius species (3) analysis indicates this genus of butterflies hybridizes and exchanges genes that provide protective color patterns

7 complete ant genomes (Atta cephalotes, Acromyrmex

echinatior, Linepithema humile, Pogonomyrmex barbatus, Harpegnathos saltator, Camponotus floridanus, Solenopsis invicta) and more on the way

Of special interest: origin of eusociality

Page 49: Chapter 7 DNA Sequencing the Evolution Omics

What have we learned? Rhodnius prolixus, a vector of Chagas� disease, may

provide new methods of control

Ixodes scapularis is a vector of Lyme disease and has a huge and complex genome (in process of being published)

Drosophila species (12) also sequenced in order to

compare evolution within the genus

Page 50: Chapter 7 DNA Sequencing the Evolution Omics

What do you need to do to sequence your insect?

Ideally, you will know the size of the genome You will be able to develop an inbred line (or have sufficient

DNA to sequence the genome from a single haploid individual) so that assembly is easier

Develop an effective DNA extraction protocol to obtain

clean DNA with little fragmentation Annotate the DNA, using automated and manual methods

Page 51: Chapter 7 DNA Sequencing the Evolution Omics

What do you need to do to sequence your insect?

Ideally, you will have a transcriptome to aid in delimiting

exons, introns, and splice variants Not all genome products are equal: quality can vary Standard draft, High-quality draft, Improved high-quality draft, Annotation-

directed improvement quality draft, Nontiguous Finished, Finished The annotations are provisional Up to half or one-third of your �genes� will have no

sequence similarity (= orphan genes)

Page 52: Chapter 7 DNA Sequencing the Evolution Omics

What do you need to do to sequence your insect?

Functional gene analysis need to confirm gene function Once the genome is published, many years of work may be

necessary to understand the biology, behavior, evolution of this species

Page 53: Chapter 7 DNA Sequencing the Evolution Omics

TEs as Agents of Genome Evolution

Genome analyses allow better understanding of extent and function of TEs

�natural genetic engineering systems� Kidwell and Lisch 2001

No longer just �selfish� or �junk�

TEs carry costs –> deleterious mutations Abundant and ancient components of genomes

Page 54: Chapter 7 DNA Sequencing the Evolution Omics

TEs as Agents of Genome Evolution

TEs can acquire a functional role

HET-A and TART retrotransposons are telomeres in D. melanogaster

TEs cause inversions in Drosophila spp

Inversions �tie up� gene combinations to maintain useful combinations

TEs can be activated by environmental and population

factors –> increased variability

Page 55: Chapter 7 DNA Sequencing the Evolution Omics

TEs as Agents of Genome Evolution

TEs have provided novel regulatory regions to preexisitng host genes

Allow new proteins to be developed

Function more like symbiosis than parasitism ?

Page 56: Chapter 7 DNA Sequencing the Evolution Omics

TEs and Genome Evolution

Three outcomes possible

Co evolution of TE - derived mechanisms to minimize negative effects of TEs on their hosts (TE self - regulation, tissue specificity, targeting)

Evolution of host - defense mechanisms (suppressors)

Evolution of new and altered functions of TEs in hosts (regulatory, structural, enzymatic)

Page 57: Chapter 7 DNA Sequencing the Evolution Omics

Transcriptomics

Transcriptome analyses: Analysis of transcripts of genes, including large and small RNAs

Conducted to discover genes using Sanger or Next-Gen sequencing methods

And to annotate coding and noncoding regions of a sequenced genome

Often conducted before sequencing a genome

Page 58: Chapter 7 DNA Sequencing the Evolution Omics

Metagenomics

Sampling genomes of a community of microorganisms inhabiting a common environment: Useful for analysis of symbionts of arthropods

Culture-independent method of identifying microbes Can also provide clues as to function

Page 59: Chapter 7 DNA Sequencing the Evolution Omics

Proteomics

Proteomics: the genome-wide analysis of proteins

Characterization of proteins and their posttranslational modifications

Comparison of protein levels and types

Studies of protein-protein interactions

Drosophila: 2297 proteins in 10,969 interactions (Guruharsha et al. 2011)

Page 60: Chapter 7 DNA Sequencing the Evolution Omics

Proteomics

Databases allow comparisons of

Sequence similarity Protein function Structure (secondary, quaternary) Similarities to other proteins Diseases associated with protein deficiencies Posttranslational modifications

Page 61: Chapter 7 DNA Sequencing the Evolution Omics

Functional Genomics

Assignment of function to genes, including understanding the organizational control of genetic pathways that make up the physiology of an organism

Can use DNA microarrays or transcriptomics

TILLING (targeting induced local lesions in genomes): mutagenesis with a chemical mutagen followed by identification of single-base mutations: a type of reverse genetics (analysis from genotype to phenotype)

Page 62: Chapter 7 DNA Sequencing the Evolution Omics

Structural Genomics

Large-scale analysis of protein structures and functions based on gene sequences

Developed after genome projects began Attempts to determine sequence, structure and function in

order to predict unknown structures by homology modeling

Probably more difficult than genome analysis

Page 63: Chapter 7 DNA Sequencing the Evolution Omics

Comparative Genomics

Comparing whole genomes to understand how genomes evolve

How many protein families are there? How many gene duplications? How similar are the protein domains and families? About 30% of all proteins are orphan genes, with no

homology to known genes: these could be the most interesting ?

Page 64: Chapter 7 DNA Sequencing the Evolution Omics

Interactomes or Reactomes

What are the networks of protein-protein interactions?

How do signaling cascades affect cell biology?

Page 65: Chapter 7 DNA Sequencing the Evolution Omics

Functional Genomics The assignment of function to genes, including

the organization and control of genetic pathways that make up the physiology of an organism

May use gene chips to measure mRNA abundance for tens of thousands of genes simultaneously, resulting in

�piles of information but only flakes of

knowledge�

Page 66: Chapter 7 DNA Sequencing the Evolution Omics

The Post - Genomic Era

For past 50 yrs, biology has become ever more �reductionist�

Reductionist approaches allow us to gain detailed information about gene structure, function, regulation, expression

Now biology �is in the midst of an intellectual and experimental sea change�

Page 67: Chapter 7 DNA Sequencing the Evolution Omics

The Post - Genomic Era

� The future will be the study of the genes and proteins of organisms in the context of their informational pathways or networks���(Leroy Hood, 2000)

In the future it may be possible to monitor simultaneously

the expression of genes at the RNA or protein level, all possible protein – protein interactions, all alleles of all genes that affect a trait and all protein-binding sites in a genome

Page 68: Chapter 7 DNA Sequencing the Evolution Omics

The Post - Genomic Era

An integrative approach to biology provides new challenges to biologists

Mathematical models and computer simulations may

be needed to study the integrated function of multiple genes

Bioinformatics methods and systems analyses more important

Emergent properties -- properties that arise from the whole rather than the individual parts