In silico Footprinting and Genomic Signature Analysis for Encephalic and Hemorrhagic Viruses Willy...

Preview:

Citation preview

In silico Footprinting and Genomic Signature Analysis for Encephalic and

Hemorrhagic Viruses

Willy A. Valdivia-GrandaOrion Integrated Biosciences, Inc.

Willy.Valdivia@orionbiosciences.com

Outline of this talk

Binary Clustering Analysis of Genomic SignaturesExamples: Flavivirus and Filoviruses

Encephalitic and Hemorrhagic Viruses Arenaviridae, Bunyaviridae, Flaviviridae, Filoviridae

In silico Genomic FootprintingGenomic Signature Detection

Directions in the Use of Genomic SiganturesFlavi-chip, Arena-chip, Bio-detection, Multimeric Vaccines

BioScience, May 2003: One million tons of dust may contain 10 quadrillion microbes (USGS).

Dissemination of Infectious Diseases

Encephalic and Hemorrhagic Viruses: NIAID Cat. A-C BSL4

Flaviviridae Hepatitis, Dengue, Yellow fever, Japanese encephalitis and West Nile viruses

Arenaviridae Argentine, Bolivian, and Venezuelan hemorrhagic fevers Lassa fevers

Bunyaviridae Hantavirus, the Congo-Crimean, Rift Valley fever virus

Filoviridae Ebola and Marburg viruses

Ecological Genomics and Biocomplexity of Viruses

100,000X

123,000X

25 Å

4 Å

5’ UTRCAP

3’ UTR

Hepaciviruses (~9.4 Kb)

Pestiviruses (~ 12Kb)

E2C NS2E1 NS3 NS4 NS4B NS5BNS5A

C E2PreM NS2-3E NS1 NS4A NS4B NS5A NS5B5’ UTR

IRE

C NS1 NS4APreM NS2A NS5BNS4BNS3E

Flavivirus (~ 11Kb) 70 Species transmitted by Mosq. Ticks, Non-Vector

NS2B5’ UTRIRES 3’ UTR

3’ UTR

Flaviviridae Family

NS1

Family Genus SpeciesSub-

Species Strain

DNA Sequencing

DNA-DNA reassociation

RT-PCR

Molecular Detection Methods for Viruses

Degenerated-PCR

Immunology

Microarray

2003 PLoS Biology | Volume 1 | Issue 2 | http://biology.plosjournals.org

Wang and DeRisi et al.

Viral sequences were physically scraped, amplified, cloned, and sequenced

Viral Sequence Recovery Using DNA Microarrays

Prototypical Coronavirus Genome Structure

Murine Hepatitis Virus (MHV)52/157 AA

Murine Hepatitis Virus (MHV)33/37 AA

Infectious Bronchitis Virus (IBV)32/32 nt

0

2

4

6

8

10

12

14

16

18

20

1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 153 161 169 177 185 193 201 209 217 225C NS1 NS4APreM NS2A NS5BNS4BNS3E

Flavivirus (~ 11Kb)

NS2B

Non Structural Protein

5’ UTRIRES 3’ UTR

Nucleotide substitution

10-3 per site per year

Why Aminoacid?

Information Content 3:1

Louping

Tick-borne

Omsk

Langat

Powassan

Deer

Dengue

Murray

Japanese

West

Yellow

Alkhurma

Montana

Rio

Apoi

Modoc

Tamana

Kamiti

Cell

100

100

100

100

100

53

53

100

80

97

90

81

49

46

26

21

0.2

Molecular Detection Methods for VirusesN

ucle

otid

e su

bstit

utio

n

Ungapped Whole Genomic Footprinting

Flavivirus (~ 11Kb)

C NS1 NS4APreM NS2A NS5BNS4BNS3E NS2B

20 20 GenomesGenomes

20 20 GenomesGenomes

Target Target GenerationGeneration

Target Target GenerationGeneration

Profile Comparison

Profile Comparison

Profile Generation

Profile Generation

Genomic Footprinting

Genomic Footprinting

Ungapped Whole Genomic Footprinting

23

23

41

41

41 19

41 19

41 19

45

29

45

45

45

45 29

24

24

42

42

45

45

41

41

11

11

11

11

11

33

33

33

33

33

41

41

41

6 31

31

31

31

31

6

45

5 11

11

33

33

14

14

23 41

22 14

45

45

2

41

41

19

19

41

41

19

19

49 11

11

33

33

11 33

11 33

31

31

31

31

31

31

45 41

41 19

41 19

24 16 45 41

41 41 11 33 31

45 41

42 45 41

24

24

16

42

45 41

45 41

45 36 22 41 19 24 42 45 41 18 11 33 31

45 41 1143 33 31

22 31

3 21 11 33 31

11

1 25 50 75 100 125 150 175 200 225 250 275

Japanese

West Nile

Yellow

Tick-borne

Louping

Murray

Deer

Modoc

Rio

Apoi

Posawan

Langat

Montana

Alkhurma

Dengue

Tamana

14 41 19 11 33 314127 18 42

Cell 40

23

23

41

41

41 19

41 19

41 19

45

29

45

45

45

45 29

24

24

42

42

45

45

41

41

11

11

11

11

11

33

33

33

33

33

41

41

41

6 31

31

31

31

31

6

45

5 11

11

33

33

14

14

23 41

22 14

45

45

2

41

41

19

19

41

41

19

19

49 11

11

33

33

11 33

11 33

31

31

31

31

31

31

45 41

41 19

41 19

24 16 45 41

41 41 11 33 31

45 41

42 45 41

24

24

16

42

45 41

45 41

45 36 22 41 19 24 42 45 41 18 11 33 31

45 41 1143 33 31

22 31

3 21 11 33 31

11

1 25 50 75 100 125 150 175 200 225 250 275

Japanese

West Nile

Yellow

Tick-borne

Louping

Murray

Deer

Modoc

Rio

Apoi

Posawan

Langat

Montana

Alkhurma

Dengue

Tamana

14 41 19 11 33 314127 18 42

Cell 40

C NS1 NS4APreM NS2A NS5BNS4BNS3E

Flavivirus (~ 11Kb)

NS2B

Non Structural Protein

5’ UTRIRES 3’ UTR

Mosquito

Ticks

No-vector

Ungapped Whole Genomic Footprinting

Tic

k-B

orn

e

Mo

squ

ito

Bo

rne

Non Vector

Core genome

Adaptation Region

CyclicRegion

Flavivirus Genomic Signature for NS5

DE

N4r

2Ad

el30

DE

N4r

DE

N4

de

l30

DE

N4r

DE

N4

DE

N4

DE

Np

4(D

elta

30)-

D2-

ME

DE

Np

4(D

elta

30)

DE

Nch

i-p

4-D

2-C

ME

DE

NC

hi-p

4-D

2-M

EK

amit

i-S

R-8

2K

amiti

-SR

-75

Buk

alas

a

Edge

Tyul

eniy

TBV-

Crimea

TBV-1

32

Koutan

go

Cowbone

Rio

Batu Usutu

Sal

Modoc

CareyDakar

Pnomm

Kunjin-pAKUN

Bouboui

Kunjin-FLSDX

SaintWNV-KN3829WNV-hISR2000WNV-VLG-4Kunjin-1WNV-3WNV-2WNV-1

Pow-SPO/B10Pow-791A-52

Pow-2542Montana

Pow-R59266

Pow-IP5001

Pow-CTB30

Pow-M11665

Pow-1982-64

Pow-M1409

Pow-1427-62

Pow-T18-23-81

Pow-64-7062

Jugra

Alfuy

JutiapaYokose

San

Ked

ou

gou

Om

skAp

oi

Sp

on

dw

eni

Sit

iaw

an

Bag

aza

Nta

ya

Tem

bu

su- M

M1

775

Tem

-166

5/9

6

Tem

318

6/98

Tem

-425

6/0

0

Tem

-MM

1775

Tem

-par

tial

DE

N1-

1600

7

DE

N2

DE

N-

DE

N2(

1668

1)

DE

N2-

1668

1-P

DK

53DEN

3-2

DEN

3201

2

DEN

3233

6

DEN3156

7

DEN3170

6

DEN3202

3

Bussuquara

Naranjal-2

5008BanziRocio

CacipacoreIsrael

SokulukKadamKyasanurYFV(17DYFV-HONG8YFV-HONG9

YFV-HONG10YFV-TN-96

SaumarezTBV 1

Langat

YaoundeSepikRoyal

Murray

Kokobera

Ilheus

Gadgets

Aroa

Iguape

Karshi

Meaban

Negishi

Russian

StratfordZika

JEV-1JE

V-2JE

V-3

Ilheuss

Sab

oya Po

tiskum 0.1

59 Species of the Flavivirus Genus

Binary classification of Flaviviruses

000011111122

FGS-1.filterFGS-10.filterFGS-11.filterFGS-12.filterFGS-13.filterFGS-14.filterFGS-15.filterFGS-16.filterFGS-17.filterFGS-18.filterFGS-19.filterFGS-2.filterFGS-20.filterFGS-21.filterFGS-22.filterFGS-23.filterFGS-24.filterFGS-25.filterFGS-26.filterFGS-27.filterDENV 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0YOKV 1 1 1 0 0 1 1 1 1 1 0 1 0 0 1 1 1 0 0 0JEV 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0MVEV 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0WNV 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0APOIV 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1MODV 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1MMLV 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1RBV 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1CFAV 1 1 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 0TABV 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0LGTV 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1LIV 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1OHFV 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1POWV 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1TBEV 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1YFV 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0Alkhurma virus1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1Deer tick virus1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1Kamiti River virus1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0

>NP_476520.1|Deer tick virus|ctb30|8160|NS3

>NP_476520.1|Deer tick virus|ctb30|9651|NS5

Arising Biological Questions About Genomic Signatures

Are genomic signatures relevant for pathogen replication?

Silencing of host genes. MHC?

Competitive advantage over other viral serotypes

Role in virus recombination and the generation of new variability

Are some genomic signatures duplications defining host range and are related with vector transmission?

Binary classification of Mosquito Borne Flaviviruses

Powassan-AF310944.1|64-7062 Powassan-AF310945.1|64-7062 Powassan-AF310943.1|T18-23-81 Powassan-AF310942|1427-62 Powassan-AF310941|M1409 Powassan-AF310940|M1409 Powassan-AF310939.1|1982-64 Powassan-AF310937.1|M11665 Possawan-NC 003687.1

Powassan|AF310938.1|SPO/B10 Powassan|AF310950.1|791A-52

Powassan-AF310946.1|CTB30 Powassan-AF310947.1|IP5001 Powassan-AF310948.1|R59266 Powassan-AF310949.1|12542

Tick-borne|U27496.1|RK1424 Tick-borne-U27492.1|TEU27492 Louping|Y07863.1|LIVGEN Tick-borne Louping|NC 001809.1 Tick-borne|L40361.3|L40361 Langat|AF253419.1-TP21 Langat|AF253420.1|attenuated Langat|NC 003690.1|

Tick-borne|U27493.1|TEU27493 Tick-borne|NC 001672.1

Tick-borne|U27490.1|

0.05

Homologous Recombination Regions

- Natural selection - Mechanistic/ecological - Genome segment reassortment

NS5 Genomic Signature Phylogenic Incongruence

J. Virol., April 1, 2004; 78(7): 3319 - 3324.J. Virol., February 15, 2004; 78(4): 2114 - 2120.

Genomic Footprinting Hemorrhagic Viruses

Ebola Virus

23

23

28 4459 66 72 9568Ebola-Reston

Marburg

Ebola-Zaire

Marburg Lake Victoria

1 25 50 75 100 125 150 175 200 225 250 275

23

23

45

45

45

45

28

28

28

27 94

75 48 8

68 40 50 59 66 64 1 75 41 95

4459 66 72 9568 75 48 894

27 94 68 40 50 59 66 64 1 75 41 95

74

74

Ebola-Zaire-Mayinga-subtype-Zaire

Ebola-Zaire-Mayinga-C

Ebola-Zaire-Mayinga-B

Ebola-Zaire-Mayinga-A

Ebola-Zaire-1995

Ebola-EBLPROTG-Zaire

Ebola-Zaire-Mayinga

Ebola-Reston

Ebola-Reston-Pennsylvania

Ebola-Rston-A

Ebola-Maleo

Marburg-1975/Ozolin

Marburg-MRVMBGL

Marburg-Lake.Victoria-pp3

Marburg-MVREPCYC

Marburg-MAVSPAB

Marburg-MVVIRPR

Marburg-NC 001608.2

Marburg-Lake.Victoria-pp4

89

68

67

0.000.020.040.060.080.10

A

F

K

P

U

B

G

L

Q

V

C

H

M

R

W

D

I

N

S

X

E

J

O

T

Y

A

F

K

P

U

B

G

L

Q

V

C

H

M

R

W

D

I

N

S

X

E

J

O

T

Y

A

F

K

P

U

B

G

L

Q

V

C

H

M

R

W

D

I

N

S

X

E

J

O

T

Y

A

F

K

P

U

B

G

L

Q

V

C

H

M

R

W

D

I

N

S

X

E

J

O

T

Y

Phylogenomic Analysis of Flavivirus Genomic Signatures

Valdivia-Granda et al. 2002.

DE

N4r2

Ad

el3

0

WNV-hISR2000WNV-VLG-4

Tem

bu

s u- M

M1

7 75

YFV-TN-96Saumarez

0.1

Oracle DB2 SybaseFlat File XML Other

(Metadatabase layer) JDBC(Metadatabase layer) JDBC

Data adapters are specific implementations of datadrivers for different genomicdatabases

Data MappingFunction

Data MappingFunction

The data mapping function maps objects and their attributesto specific databases

Process Flow

a. The authorization process begins from the client and is passed to the Security and Administration API . This process select the services API.

b. The Administration selects the Data Analysis API and the data request is passed to the Data Abstraction Layer (DAL).

c. The data mapping function is invoked, the specified application and URI are referenced and the proper driver.

d. The various data drivers implement the Metadatabse layer (MDL) produce common requests and result sets are selectively cached in the Data Abstraction Server.

e. Once data is delivered from one of the database, it may be sent to the analysis application.

GlobalSchemaGlobal

Schema

Security and Administration Application

SequenceAnalysis Application

SequenceAnalysis Application

Microarray Analysis Application

Microarray Analysis Application

Proteomic Analysis Application

Proteomic Analysis Application

2D and 3D Sequence Visualization

2D and 3D Sequence Visualization

Cytogenomic MapVisualization

Cytogenomic MapVisualization

TranscriptionalNetwork Visualization

TranscriptionalNetwork Visualization

Client

FirewallFirewall1

2

3

Nu

mb

er

of p

ub

lishe

d p

ap

ers

Nu

mb

er

of p

ub

lishe

d p

ap

ers

Development of New Detection Devices

15840 16 1

1320

0

200

400

600

800

1000

1200

1400

Malaria

HIVViruses

Dengue

Cancer

UC Berkeley

Centers for Diseases Control

University of Zurich, Switzerland

MIT

Walter Reed Army Institute of Research

San Diego Supercomputing Center

UMass Med School

Pasteur Institute, France

Pasteur Institute, Senegal

Sandia National LaboratoriesAugust 2003

Collaborators

December 2002

KDD-cup

(A) (B) (T)

(T+A+B) (T:A:B)

(2T)

Arising Biological Questions About Genomic Signatures

Genomic signatures and the Origen of Life?

Repeated self-replication and simple evolvability http://www.eastman.ucl.ac.uk/~thutton/Evolution/

Hutton, T.J. (2002) Evolvable Self-Replicating Molecules in an Artificial Chemistry. Artificial Life 8(4):341-356.

Lee, D. H.; Granja, J. R.; Martinez, J. A.; Severin, K.; Ghadiri, M. R. "A Self-Replicating Peptide". Nature 1996, 382, 525-28.

Martin A. Nowak, Karl Sigmund Phage-lift for game theory. Nature398, 367 - 368

Paul E. Turner, Lin Chao . Prisoner's dilemma in an RNA virus. Nature398, 441 - 443

Seoul

Sin Nombre

Dobrava

1 25 50 75 100 125 150 175 200 225 250 275

13 6 4174 5

8

13

13

6

6

16

16

16

3

3

3

16

172 5

16

4

1

8

4 15

4

4 17

Ungapped Whole Genomic Footprinting

Ebola-Zaire-Mayinga-subtype-Zaire

Ebola-Zaire-Mayinga-C

Ebola-Zaire-Mayinga-B

Ebola-Zaire-Mayinga-A

Ebola-Zaire-1995

Ebola-EBLPROTG-Zaire

Ebola-Zaire-Mayinga

Ebola-Reston

Ebola-Reston-Pennsylvania

Ebola-Rston-A

Ebola-Maleo

Marburg-1975/Ozolin

Marburg-MRVMBGL

Marburg-Lake.Victoria-pp3

Marburg-MVREPCYC

Marburg-MAVSPAB

Marburg-MVVIRPR

Marburg-NC 001608.2

Marburg-Lake.Victoria-pp4

89

68

67

0.000.020.040.060.080.10

Phylogenomic Analysis of Hemorrhagic Fevers

Lethality

Dispersion

Viral IsolationViral Extinction threshold

Viral Isolation

Viral Extinction threshold

Time

Risk for dengue fever (DF) among travelers to Thailand, 2002.

Christina Frank,* Irene Schöneberg,* Gérard Krause,* Hermann Claus,* Andrea Ammon,* and Klaus Stark* *Robert Koch Institute, Berlin, Germany

http://www.cdc.gov/ncidod/EID/vol10no5/03-0495-G2.htm

The Central Dogma• Gene finding algorithms

• Mutation• Alternative splicing• Folding dynamics

The Central Dogma• Gene finding algorithms

• Mutation• Alternative splicing• Folding dynamics

Pathways• Directionality

• Association accuracy

Pathways• Directionality

• Association accuracy

Functional Modules• Module directionality

• Visualization

Functional Modules• Module directionality

• Visualization

Large Scale Organization• Evolutionary perspective

• Ecological Level?

Large Scale Organization• Evolutionary perspective

• Ecological Level?

Viral Life Cycle

Viral Adaptation

Viral Structural Changes

Sequence Space

Selection

Evolutionary Dinamics

90% of the Zairian cases and 50% of the Sudanese cases resulted in death.

Marburg hemorrhagic fever is between 23-25%.

Dengue vEntebbe

JapaneseKokobera

Modoc viMosquito

Ntaya viRio Brav

Seabird Spondwei

TentativTick-bor

Yellow fzunclass

0

1

2

3

4

5

OIB

FG

S1

Genomic Signature Count

HLA-A1 HLA-A68.1

qvpfcsnhftel Lupoid hepatitis restricted hepatitis B core antigen

A comprehensive genomic analysis of the genus Flavivirus genus suggest the existence of a core viral genome composed by 47 elements each with a length of 12 aminoacids. For 7 of the viral genomes there is at least one copy of each element. However, several genomic signatures are duplicated up to three times. But it remains unclear if the generation of genomic signatures are cyclic events.

Our analysis shown that duplication of genomic signatures and the mutation still a relevant process in viral genome evolution, and could be is involved in viral recombination and self interaction.

As mutation pressures selected fitted individuals, new species with novel characteristics emerged. However, there is a tradeoff between viral pathogenesis and dispersal.

Conclusions

Each of these particles is about 4.5 microns long--about one-twentieth the diameter of a human hair, which is about 100 microns. Alternating gold and silver stripes create the "barcode" pattern on these tiny particles. When viewed in blue light under a microscope, silver is much more reflective than gold, making different-patterned particles easy to identify.

4.5 microns

Aminoacid Usage in the Flavivirus and Filovivirus`

%

Filovirus Flavivirus

23

23

41

41

41 19

41 19

41 19

45

29

45

45

45

45 29

24

24

42

42

45

45

41

41

11

11

11

11

11

33

33

33

33

33

41

41

41

6 31

31

31

31

31

6

45

5 11

11

33

33

14

14

23 41

22 14

45

45

2

41

41

19

19

41

41

19

19

49 11

11

33

33

11 33

11 33

31

31

31

31

31

31

45 41

41 19

41 19

24 16 45 41

41 41 11 33 31

45 41

42 45 41

24

24

16

42

45 41

45 41

45 36 22 41 19 24 42 45 41 18 11 33 31

45 41 1143 33 31

22 31

3 21 11 33 31

11

1 25 50 75 100 125 150 175 200 225 250 275

Japanese

West Nile

Yellow

Tick-borne

Louping

Murray

Deer

Modoc

Rio

Apoi

Posawan

Langat

Montana

Alkhurma

Dengue

Tamana

14 41 19 11 33 314127 18 42

Cell 40

23

23

41

41

41 19

41 19

41 19

45

29

45

45

45

45 29

24

24

42

42

45

45

41

41

11

11

11

11

11

33

33

33

33

33

41

41

41

6 31

31

31

31

31

6

45

5 11

11

33

33

14

14

23 41

22 14

45

45

2

41

41

19

19

41

41

19

19

49 11

11

33

33

11 33

11 33

31

31

31

31

31

31

45 41

41 19

41 19

24 16 45 41

41 41 11 33 31

45 41

42 45 41

24

24

16

42

45 41

45 41

45 36 22 41 19 24 42 45 41 18 11 33 31

45 41 1143 33 31

22 31

3 21 11 33 31

11

1 25 50 75 100 125 150 175 200 225 250 275

Japanese

West Nile

Yellow

Tick-borne

Louping

Murray

Deer

Modoc

Rio

Apoi

Posawan

Langat

Montana

Alkhurma

Dengue

Tamana

14 41 19 11 33 314127 18 42

Cell 40

Genomic Footprinting and Biological Complexity

Microarray Detection Analysis