67
Viral Genomics

Viral Genomics

  • Upload
    lamis

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Viral Genomics. Outline of today’s lecture. Introduction Classification of Viruses Diversity and Evolution of Viruses Metagenomics and Virus Diversity Bioinformatics Approaches to Problems in Virology Influenza Virus Herpesvirus: From Phylogeny to Gene Expression - PowerPoint PPT Presentation

Citation preview

Page 1: Viral Genomics

Viral Genomics

Page 2: Viral Genomics

Outline of today’s lecture

IntroductionClassification of VirusesDiversity and Evolution of VirusesMetagenomics and Virus Diversity

Bioinformatics Approaches to Problems in VirologyInfluenza VirusHerpesvirus: From Phylogeny to Gene ExpressionHuman Immunodeficiency Virus

Bioinformatic Approaches to HIV-1Measles Virus

Page 3: Viral Genomics

Learning objectives for today’s lecture

• Describe how viruses are classified

• Explain bioinformatics approaches to virology

• Describe the influenza virus genome including the new H1N1 virus

• Provide a descriptio of the Herpesviruses

• Use NCBI and LANL resources to identify the function and evolution of Human Immunodeficiency Virus (HIV-1)

Page 4: Viral Genomics
Page 5: Viral Genomics

Viruses are small, infectious, obligate intracellular parasites. They depend on host cells to replicate. Because they lack the resources for independent existence, they exist on the borderline of the definition of life.

The virion (virus particle) consists of a nucleic acid genome surrounded by coat proteins (capsid) that may be enveloped in a host-derived lipid bilayer.

Viral genomes consist of either RNA or DNA. They may be single-, double, or partially double stranded. The genomes may be circular, linear, or segmented.

Introduction to viruses

Page 567

Page 6: Viral Genomics

Viruses have been classified by several criteria:

-- based on morphology (e.g. by electron microscopy)

-- by type of nucleic acid in the genome

-- by size (rubella is about 2 kb; HIV-1 about 9 kb; poxviruses are several hundred kb). Mimivirus (for Mimicking microbe) has a double-stranded circular genome of 1.2 megabases (Mb).

-- based on human disease

Page 568

Introduction to viruses

Page 7: Viral Genomics

Fig. 14.1Page 569

Page 8: Viral Genomics

Fig. 14.2Page 570

The International Committee on Taxonomy of Viruses(ICTV) offers a website, accessible via NCBI’s Entrez site

http://www.ncbi.nlm.nih.gov/ICTVdb/

Page 9: Viral Genomics

Mimivirus is the sole member of the Mimiviridae family of nucleocytoplasmic large DNA viruses (NCLDVs).

It was isolated from amoebae growing in England.

The mature particle has a diameter of ~400 nanometers, comparable to a small bacterium (e.g. a mycoplasma). Thus, mimivirus is by far the largest virus identified to date.

Mimivirus: mimicking microbe

Page 569

Page 10: Viral Genomics

The mimivirus genome is 1.2 Mb (1,181,404 base pairs). It is a double-stranded DNA virus.

► Two inverted repeats of 900 base pairs at the ends (thus it may circularize)► 72% AT content (~28% GC content)► 1262 putative open-reading frames (ORFs) of length >100 amino acids. 911 of these are predicted to be protein-coding genes► Unique features include genes predicted to encode proteins that function in protein translation. The inability to perform protein synthesis has been considered a prime feature of viruses, in contrast to most life forms.

See Raoult D et al. (2004) Science 306:1344.

Mimivirus: mimicking microbe

Page 569

Page 11: Viral Genomics

Viral metagenomics refers to the sampling of representative viral genomes from the environment.

A typical viral genome is ~50 kilobases (in comparison, a typical microbial genome is ~2.5 megabases). A sample is collected (e.g. seawater, fecal material, or soil). Cellular material is excluded. Viral DNA is extracted, cloned, and sequenced.

Viral metagenomics

Page 573

Page 12: Viral Genomics

Comparison of viral metagenomic libraries to the GenBank non-redundant database. Viral metagenomic sequences from human faeces, a marine sediment sample and two seawater samples were compared to the GenBank non-redundant database at the date of publication and in December 2004. The percentage of each library that could be classified as Eukarya, Bacteria, Archaea, viruses or showed no similarities (E-value >0.001) is shown. Edwards RA, Rohwer F. Nature Reviews Microbiology 3, 504-510 (2005)

Page 13: Viral Genomics

Edwards RA, Rohwer F. Nature Reviews Microbiology 3, 504-510 (2005)

“The Phage Proteomic Tree is a whole-genome-based taxonomy system that can be used to identify similarities between complete phage genomes and metagenomic sequences. This new version of the tree contains 167 phage genomes. Phages in black cannot be classified into any clade. In the key, each phage is defined in a clockwise direction.”

Page 14: Viral Genomics

Genomic overview of the uncultured viral community from human feces based on TBLASTX sequence similarities. (A) Numbers of sequences with significant matches (E values of <0.001) in GenBank. (B) Distribution of significant matches among major classes of biological entities. (C) Types of mobile elements recognized in the library. (D) Families of phages identified in the fecal library.

Mya Breitbart M. et al. (2003) Metagenomic Analyses of an Uncultured Viral Community from Human Feces. J Bacteriol. 185: 6220–6223.

Page 15: Viral Genomics

Categories of phage proteins with significant matches in the uncultured human fecal viral library

Mya Breitbart M. et al. (2003) Metagenomic Analyses of an Uncultured Viral Community from Human Feces. J Bacteriol. 185: 6220–6223.

Page 16: Viral Genomics

Vaccine-preventable viral diseases include:

Hepatitis A Hepatitis BInfluenzaMeaslesMumpsPoliomyelitisRubellaSmallpox

Page 571

Human disease relevance of viruses

Source: Centers for Disease Control website

Page 17: Viral Genomics

Disease Virus

Hepatitis A Hepatitis A virusHepatitis B Hepatitis B virusInfluenza Influenza type A or BMeasles Measles virusMumps RubulavirusPoliomyelitis Poliovirus (three serotypes)Rotavirus RotavirusRubella Genus RubivirusSmallpox Variola virusVaricella Varicella-zoster virus

Page 571Source: Centers for Disease Control website

Human disease relevance of viruses

Page 18: Viral Genomics

Outline of today’s lecture

IntroductionClassification of VirusesDiversity and Evolution of VirusesMetagenomics and Virus Diversity

Bioinformatics Approaches to Problems in VirologyInfluenza VirusHerpesvirus: From Phylogeny to Gene ExpressionHuman Immunodeficiency Virus

Bioinformatic Approaches to HIV-1Measles Virus

Page 19: Viral Genomics

Some of the outstanding problems in virology include:

-- Why does a virus such as HIV-1 infect one species (human) selectively?

-- Why do some viruses change their natural host? In 1997 a chicken influenza virus killed six people.

-- Why are some viral strains particularly deadly?

-- What are the mechanisms of viral evasion of the host immune system?

-- Where did viruses originate?

Bioinformatic approaches to viruses

Page 574

Page 20: Viral Genomics

The unique nature of viruses presents special challengesto studies of their evolution.

• viruses tend not to survive in historical samples• viral polymerases of RNA genomes typically lack proofreading activity• viruses undergo an extremely high rate of replication• many viral genomes are segmented; shuffling may occur• viruses may be subjected to intense selective pressures (host immune respones, antiviral therapy)• viruses invade diverse species• the diversity of viral genomes precludes us from making comprehensive phylogenetic trees of viruses

Diversity and evolution of viruses

Page 574

Page 21: Viral Genomics

archaea

bacteria

eukaryota

viruses

influenza

SARS

viruses

Page 22: Viral Genomics

Overview of viral complete genomes

PASC ►

Page 23: Viral Genomics

PASC: pairwise sequence comparison of viruses

Page 24: Viral Genomics

Overview of viral complete genomes

Page 25: Viral Genomics

Outline of today’s lecture

IntroductionClassification of VirusesDiversity and Evolution of VirusesMetagenomics and Virus Diversity

Bioinformatics Approaches to Problems in VirologyInfluenza VirusHerpesvirus: From Phylogeny to Gene ExpressionHuman Immunodeficiency Virus

Bioinformatic Approaches to HIV-1Measles Virus

Page 26: Viral Genomics

Influenza viruses belong to the family Orthomyxoviridae.

The viral particles are about 80-120 nm in diameter and can be spherical or pleiomorphic. They have a lipid membrane envelope that contains the two glycoproteins: hemagglutinin (H) and neuraminidase (N). These two proteins determine the subtypes of Influenza A virus.

Influenza virus

Influenza A

Influenza virus leads to 200,000 hospitalizations and ~36,000 deaths in the U.S. each year.

Page 574

Page 27: Viral Genomics

Since 1976, the H5N1 avian influenza virus has infected at least 232 people (mostly in Asia), of whom 134 have died.

A major concern is that a human influenza virus and the H5N1 avian influenza strain were to combine, a new lethal virus could emerge causing a human pandemic. In a pandemic, 20% to 40% of the population is infected per year.

►The 1918 Spanish influenza virus killed tens of millions of people (H1N1 subtype).►1957 (H2N2)► 1968 (H3N2)► Asia 2003-2005 (H5N1)► Current, 2009 (H1N1, “swine flu”)

Influenza virus

Page 575

Page 28: Viral Genomics

There are three types: A, B, C► A and B cause flu epidemics► Influenza A: 20 subtypes; occurs in humans, other animals.For example, in birds there are nine subtypes based on the type of neuraminidase expressed (group 1: N1, N4, N5, N8; group 2: N2, N3, N6, N7, N9). The structure of H5N1 avian influenza neuraminidase has been reported (Russell RJ et al., Nature 443:45, 2006). ► Influenza A genome consists of eight, single negative-strand RNAs (from 890 to 2340 nucleotides). Each RNA segment encodes one to two proteins.

Influenza virus

Page 575

Page 29: Viral Genomics

Page 576

Page 30: Viral Genomics

NCBI offers an Influenza Virus Resource (http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html)

Page 31: Viral Genomics

Growth of Influenza Virus Sequences in GenBank

10/08 http://www.ncbi.nlm.nih.gov/genomes/FLU/growth.html

Page 32: Viral Genomics

Holmes et al. (2005) performed phylogenetic analyses of 156 complete genomes of human H3N2 influenza A viruses collected over time (1999-2004) in one location (New York State).

Phylogenetic analysis revealed multiple reassortment events. One clade of H3N2 virus, present since 2002, is the source for the HA gene in all subsequently sampled viruses.

Large-scale influenza virus genome analysis

Holmes EC, et al. Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol. 2005 Sep;3(9):e300.

Page 576

Page 33: Viral Genomics

Evolutionary Relationships of Concatenated Major Coding Regions of Influenza A Viruses Sampled in New York State during 1999–2004. The maximum likelihood phylogenetic tree is mid-point rooted for purposes of clarity, and all horizontal branch lengths are drawn to scale. Bootstrap values are shown for key nodes. Isolates assigned to clade A (light blue), clade B (yellow), and clade C (red) are indicated, as are those isolates involved in other reassortment events: A/New York/11/2003 (orange), A/New York/182/2000 (dark blue), and A/New York/137/1999 and A/New York/138/1999 (green).

Holmes EC, et al. Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol. 2005 Sep;3(9):e300.

Page 34: Viral Genomics

Holmes EC, et al. Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol. 2005 Sep;3(9):e300.

Page 35: Viral Genomics

Ghedin et al. (2005) sequenced 209 complete genomes of human influenza A virus (sequencing 2,821,103 nucleotides). See Nature 437:1162.

Large-scale influenza virus genome analysis

Page 36: Viral Genomics

Each row represents a single amino acid position in one protein. Amino acids (single-letter abbreviations are used) are colour-coded as shown in the key, so that mutations can be seen as changes in colour when scanning from left to right along a row. For simplicity, only amino acids that showed changes in at least three isolates are shown. Each column represents a single isolate, and columns are only a few pixels wide in order to display all 207 H3N2 isolates in this figure. Isolates are ordered along the columns chronologically according to the date of collection; boundaries between influenza seasons are indicated by gaps between columns. A more detailed version of this figure, showing positions that experienced any amino acid change and showing identifiers for the isolates in each column, is available as Supplementary Fig. 1.

Ghedin E, et al. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Nature. 2005 Oct 20;437(7062):1162-6.

Page 37: Viral Genomics

207 H3N2 isolates

amin

o ac

id p

osit

ions

in in

flue

nza

prot

eins

Page 38: Viral Genomics
Page 39: Viral Genomics

Outline of today’s lecture

IntroductionClassification of VirusesDiversity and Evolution of VirusesMetagenomics and Virus Diversity

Bioinformatics Approaches to Problems in VirologyInfluenza VirusHerpesvirus: From Phylogeny to Gene ExpressionHuman Immunodeficiency Virus

Bioinformatic Approaches to HIV-1Measles Virus

Page 40: Viral Genomics

Herpesviruses are double-stranded DNA viruses thatinclude herpes simplex, cytomegalovirus, and Epstein-Barr.

The genomic DNA is packed inside an icosahedral capsid; with a lipid bilayer the diameter is ~200 nanometers.

Herpesvirus

Page 578

Page 41: Viral Genomics

Phylogenetic analysis suggests three major groups that originated about 180-220 MYA.

Mammalian herpesviruses are in all three subfamilies. Avian and reptilian herpesviruses are all in the Alphaherpesvirinae.

Page 578

Herpesvirus

Page 42: Viral Genomics

Fig. 14.6Page 578

Millions of years before present

Herpesvirus: three main groups

Page 43: Viral Genomics

McGeoch et al. (Virus Res. 117:90-104, 2006) describe a new herpesvirus taxonomy.

Family HerpesviridaeSubfamilies Alpha-, Beta-, Gammaherpesvirinae

New family Alloherpesviridae (piscine, amphibian herpesviruses)

Herpesvirus taxonomy

Page 578

Page 44: Viral Genomics

Alphaherpesvirinae

Gammaherpesvirinae

Betaherpesvirinae

Alloherpesviridae(piscine, amphibian)

Malacoherpesviridae(invertebrate HV)

protein-coding regions

Blocks of core genes (I–VII)

putative ATPase subunit of the terminase

McGeoch DJ et al. (Virus Res. 117:90-104, 2006)

Page 45: Viral Genomics

Genome sizes range from 124 kb (simian varicella virus from Alphaherpesvirinae) to 241 kb (chimpanzee cytomegalovirus from Betaherpesvirinae).

► GC content ranges from 32% to 75%.► Protein-coding regions occur at a density of one gene per 1.5 to 2 kb of herpesvirus DNA. ► There are immediate-early genes, early genes (nucleotide metabolism, DNA replication), and late genes (encoding proteins comprising the virion).► Introns occur in some herpesvirus genes. ► Noncoding RNAs have been described (e.g. latency-associated transcripts in HSV-1).

Herpesvirus taxonomy

Page 46: Viral Genomics

Consider human herpesvirus 8 (HHV-8)(family Herpesviridae; subfamily Gammaherpesvirinae). Its genome is ~140,000 base pairs and encodes ~80 proteins. Its RefSeq accession number is NC_003409.

We can explore this virus at the NCBI website.Try NCBI Entrez Genomes viruses (this is on the right sidebar) dsDNA

Bioinformatic approaches to herpesvirus

Page 579

Page 47: Viral Genomics

Page 579

clusters►

NCBI virus site includes tools (e.g. “Protein clusters”) to analyze herpesviruses

Page 48: Viral Genomics

Fig. 14.7Page 579

NCBI virus site includes tools (e.g. “Protein clusters”) to analyze herpesviruses

Page 49: Viral Genomics

HHV-8 proteins include structural and metabolic proteins. There are also viral homologs of human host proteins such as the apoptosis inhibitor Bcl-2, an interleukin receptor, and a neural cell adhesion-related adhesin.

Mechanisms by which viruses may acquire host proteins include recombination, transposition, splicing. A blastp search using HHV-8 interleukin IL-8 receptor as a query reveals several other viral IL-8 receptor molecules.

Viruses can acquire host genes

Page 579

Page 50: Viral Genomics

Fig. 14.11Page 581

Page 51: Viral Genomics

Functional genomics approaches have been applied to human herpesvirus 8 (HHV-8). For example, microarrays have been used to define changes in viral gene expression at different stages of infection (Paulose-Murphy et al., 2001). Conversely, gene expression changes have been measured in human cells following viral infection.

Bioinformatic approaches to herpesvirus

Page 582

Page 52: Viral Genomics

Fig. 14.12Page 582

Paulose-Murphy et al. (2001)described HHV-8 viral genesthat are expressed at different times post infection

Page 53: Viral Genomics

Paulose-Murphy et al. (2001)

Page 54: Viral Genomics

Outline of today’s lecture

IntroductionClassification of VirusesDiversity and Evolution of VirusesMetagenomics and Virus Diversity

Bioinformatics Approaches to Problems in VirologyInfluenza VirusHerpesvirus: From Phylogeny to Gene ExpressionHuman Immunodeficiency Virus

Bioinformatic Approaches to HIV-1Measles Virus

Page 55: Viral Genomics

Human Immunodeficiency Virus (HIV) is the cause ofAIDS. Some have estimated that 33 million people were infected with HIV (2006).

HIV-1 and HIV-2 are primate lentiviruses. The HIV-1 genome is 9181 bases in length. Note that there are >300,000 Entrez nucleotide records for this genome (but only one RefSeq entry).

Phylogenetic analyses suggest that HIV-2 appeared asa cross-species contamination from a simian virus,SIVsm (sooty mangebey). Similarly, HIV-1 appeared from simian immunodeficiency virus of the chimpanzee(SIVcpz).

Bioinformatic approaches to HIV

Page 583

Page 56: Viral Genomics

Fig. 14.13Page 584

HIV phylogeny based on pol suggests five clades

Hahn et al., 2000

1. Simian immunodeficiency virus from the chimpanzee Pan troglodytes (SIVcpz) with HIV-1

Page 57: Viral Genomics

HIV phylogeny based on pol suggests five clades

Hahn et al., 2000

2. SIV from the sooty mangabeys Cerecocebus atys (SIVsm), with HIV-2 and SIV from the macaque (genus Macaca; SIVmac)

Fig. 14.13Page 584

Page 58: Viral Genomics

HIV phylogeny based on pol suggests five clades

Hahn et al., 2000

3. SIV from African green monkeys (genus Chlorocebus)(SIVagm)

Fig. 14.13Page 584

Page 59: Viral Genomics

HIV phylogeny based on pol suggests five clades

Hahn et al., 2000

4. SIV from Sykes’ monkeys, Cercopithecus albogularis (SIVsyk)

Fig. 14.13Page 584

Page 60: Viral Genomics

HIV phylogeny based on pol suggests five clades

Hahn et al., 2000

5. SIV from l’Hoest monkeys (Cercopithecus lhoesti); from suntailed monkeys (Cercopithecus solatus); and from mandrill (Mandrillus sphinx)

Page 61: Viral Genomics

NCBI offers a retrovirus resource with reference genomesand protein sets, and several tools (alignment, genotyping).

Bioinformatic approaches to HIV: NCBI

Page 585

Page 62: Viral Genomics

10/08

Page 63: Viral Genomics

Example of genotyping tool from NCBI retrovirus resource

reference sequence with the highest score

Page 64: Viral Genomics

Los Alamos National Laboratory (LANL) databases provide a major HIV resource.

See http://hiv-web.lanl.gov/

LANL offers-- an HIV BLAST server-- Synonymous/non-synonymous analysis program-- a multiple alignment program-- a PCA-like tool-- a geography tool

Bioinformatic approaches to HIV: LANL

Page 586

Page 65: Viral Genomics
Page 66: Viral Genomics

LANL offers many HIV tools including analysis algorithms

Page 67: Viral Genomics

Fig. 14.17Page 588http://resdb.lanl.gov/Resist_DB/protease_mutation_map.htm