41
1 Data Vaccination: Computational Biology Enhances Infectious Disease Research Lisa Herron-Olson Microbiologist Syntiron LLC Saint Paul, MN SARS Spreads From China Markets West Nile Virus Arrives in New York Monkeypox Acquired From Prairie Dogs Canadian Cow Tests Positive for BSE Weaponized Anthrax Sent in US Mail Experts warn of avian flu pandemic

Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

1

Data Vaccination:

Computational Biology Enhances Infectious Disease Research

Lisa Herron-OlsonMicrobiologist

Syntiron LLCSaint Paul, MN

SARS Spreads From China Markets

West Nile Virus Arrives in New YorkMonkeypox Acquired From Prairie Dogs

Canadian Cow Tests Positive for BSEWeaponized Anthrax Sent in US MailExperts warn of avian flu pandemic

Page 2: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

2

Areas of Discussion

Computational tools in infectious disease research

• ID database to include molecular biology

• CASE STUDY: Staphylococcus aureus

• Comparative genomics• Assembly• Sequence analysis

• Functional genomics• Gene expression• Proteomics

• Significance: moving closer to a vaccine

Messages

•Existing tools are good.

•Databases•Algorithms•Associated software tools

•More/better tools are continually needed.

•Even if not necessarily requested…•Even if not necessarily requested in clear language…•Even if the current tools work…•Even if the newest version was launched an hour ago…

Page 3: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

3

Biological Problem:

How can we improve ourresistance to microbial infection in a changing world?

Focusing research on zoonotic pathogenswill increase the likelihood of survival for

human and other animal hosts.

Page 4: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

4

Bacterial host specificity

Evidence for genetic similarity among clones associated with a specific host

– Salmonella spp. – human, cow, pig, chicken, fish– Rhizobium spp., Bradyrhizobium spp., etc. – legumes and other plants– Pseudomonas syringae – tomato, potato, tobacco, bean, pear

What is the genetic basis for host specificity and what can it tell us about the pathogenesis of specific diseases?

Understanding infectious disease ecology: overall challeng

Page 5: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

5

Understanding infectious disease ecology: overall challeng

Tracking infectious disease: current systems

NEDSS (Natl. Electronic Disease Surveillance System)

Strengths: well-developed and fundedincludes some non-human hosts

Weaknesses: clinically focusedno molecular biology*various levels of implementation

PRO-Med Mail (PROgram for Monitoring Infectious Diseases)

Strengths: rapidglobalmoderated by human ‘expert’ teams

Weaknesses: Internet-basedhuman only*human moderators do not scale up

Page 6: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

6

• Variation in amount and type of data collected during a specific case, inability to guarantee ‘complete’ records

• Variation in data format

• Integration of local, (county/state/province/region), federal health and surveillance systems

• Uneven distribution of resources

• Human focus

Understanding infectious disease ecology: overall challeng

Host

Disease

Symptom

Pathogen

•Tracking

•Prediction

•Therapy•Vaccine

•Control

Problem

For hundreds of years, surveillance systems relied upon the concept of ‘symptom’ as the entry point for tracking data.

Page 7: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

7

Problem

For hundreds of years, surveillance systems relied upon the concept of ‘symptom’ as the entry point for tracking data.

Host

Disease

Symptom

Pathogen

•Tracking

•Prediction

•Therapy•Vaccine

•Control

ResolutionImproving the resolution between ‘pathogen’ and ‘symptom’ includes vital information for improving success on the right side

Host-pathogen

interaction

Component

Protein

Transcript

Virulence/protection

gene

Genomic sequence

Host

Disease

Symptom

Pathogen

•Tracking

•Prediction

•Therapy•Vaccine

•Control

Page 8: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

8

Objectives

The model is designed to remember important data types involved with infection, with an emphasis on keeping track of multiple hosts and including molecular biological data.

Through this, we aim to accomplish two major objectives:

Objective 1: Identify molecular mechanisms underlying pathogenesis

Objective 2: Improve epidemiological surveillance

Core of the model: Host – Pathogen – Disease Triangle (HPD

Disease

Host Pathogen

Environment

The two objectives could be combined in a single model because the important data to remember about each objective

are anchored by a common core.

Page 9: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

9

host idhost date of birthhost date of deathhost sexhost reproductive statusHost current genderHost time of birthHost birth order numberHost time of deathHost # reproduction eventsHost common nameHost siblings flagHost ethnicityHost contact phone numberHost infertility flagEye colorHair/fur colorCoat patternLeaf patternFlowering flagSeeding flagHost photographHandicapped flagReligious afÞliationHost contact phone numberGuardian flagNumber of offspring under careMarital statusMother maiden nameKnown inbreeding/self poll flag

host

mothers

be mothered by

fathers

be fathered by

disease iddisease clinical namedisease common name

disease

pathogen idPathogen common namePathogen original discovererReproductive rateNumber of life stagesPathogen reproduction typeColony/community colorColony/community morphologyAerobic/anaerobic respiration flagIdentiÞed in host organism flagPathogen genderMotile flagSporulating flagPathogen adult size

pathogen

KOLT idkind of living thing

genome sequence idGS lengthGS chromosome countGS plasmid countGS GC contentGS insertion seq. countGS ribosome countGS gap countGS nucleotide seq.

genome sequence

start positionend position

gene in sequence

gene idgene namegene nucleotide lengthgene amino acid lengthgene nucleotide seq.gene amino acid seq.gene GC contentgene ECgene phage flag

gene

PCRP idPCRP lengthPCRP seq.PCRP puriÞedPCRP forward primerPCRP reverse primer

PCR ProductMA idMA content descMA systemMA spot count

microarray

chip idchip mfg.

chip

array xarray y

array coordinate

spot idmicroarray spot

bebe

plate idMAL xMAL y

MA location

has

be .. of

bebe

CE idCE dateCE objective

chip experimentuses

be used for

GP presence flagGP copy count

gene presence

sample idsample typesample storage typesample storage tempsample harvest datesample harvest time

sample

host-sample pair pathogen-sample pair

symptom idsymptom clinical namesymptom common name

symptom

symptom severity seqsymptom severity desc.

symptom severityclinical/subjective flagex. symptom diagnosis date

exhibited symptom

diagnosis reasonaffliction

disease-symptom pair

CE-HS pair

hybrid. sample idRNA extraction datehybrid. sample label typehybrid. sample control flagRNA source*

hybridization sampleEoT quantitationexpression of transcript

EoP quantitationexpression of protein

PE-PT pair

PEE idPEE type

transcript expression experiment

TEE idTEE type

protein expression experiment?

comprisesis comprised of

genus idgenus

species idspecies

sub-species idsub-speciesbe be

be be

be be

TE idtransmission event

vector host in TE

location idlocation

IE dateisolation event

HiL alive flaghost in location

LTC idLTC type value

location time componentLPC idLPC type value

location place component

LPC type nameLPC type

LTC type nameLTC type

LTC-LTC pair be objectobjectify

LTCP-preposition valueLTCP-preposition desc

LTC-pair preposition

LPC-LPC pair

LPCP-preposition valueLPCP-preposition desc

LPC-pair preposition

PiL alive flagpathogen in location

be

be

susceptibility idsusceptibility age*

susceptibility

be subjectsubjectify

bebe

be objectobjectify

be subjectsubjectify

CSC descriptioncomposite sample component

composite samplebebe

Challenge solution: HPD Triangle

Touring the Data Model: Molecular epidemiology

Page 10: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

10

Touring the Data Model: Pathogenesis

host idhost date of birthhost date of deathhost sexhost reproductive statusHost current genderHost time of birthHost birth order numberHost time of deathHost # reproduction eventsHost common nameHost siblings flagHost ethnicityHost contact phone numberHost infertility flagEye colorHair/fur colorCoat patternLeaf patternFlowering flagSeeding flagHost photographHandicapped flagReligious afÞliationHost contact phone numberGuardian flagNumber of offspring under careMarital statusMother maiden nameKnown inbreeding/self poll flag

host

mothers

be mothered by

fathers

be fathered by

disease iddisease clinical namedisease common name

disease

pathogen idPathogen common namePathogen original discovererReproductive rateNumber of life stagesPathogen reproduction typeColony/community colorColony/community morphologyAerobic/anaerobic respiration flagIdentiÞed in host organism flagPathogen genderMotile flagSporulating flagPathogen adult size

pathogen

KOLT idkind of living thing

genome sequence idGS lengthGS chromosome countGS plasmid countGS GC contentGS insertion seq. countGS ribosome countGS gap countGS nucleotide seq.

genome sequence

start positionend position

gene in sequence

gene idgene namegene nucleotide lengthgene amino acid lengthgene nucleotide seq.gene amino acid seq.gene GC contentgene ECgene phage flag

gene

PCRP idPCRP lengthPCRP seq.PCRP puriÞedPCRP forward primerPCRP reverse primer

PCR ProductMA idMA content descMA systemMA spot count

microarray

chip idchip mfg.

chip

array xarray y

array coordinate

spot idmicroarray spot

bebe

plate idMAL xMAL y

MA location

has

be .. of

bebe

CE idCE dateCE objective

chip experimentuses

be used for

GP presence flagGP copy count

gene presence

sample idsample typesample storage typesample storage tempsample harvest datesample harvest time

sample

host-sample pair pathogen-sample pair

symptom idsymptom clinical namesymptom common name

symptom

symptom severity seqsymptom severity desc.

symptom severityclinical/subjective flagex. symptom diagnosis date

exhibited symptom

diagnosis reasonaffliction

disease-symptom pair

CE-HS pair

hybrid. sample idRNA extraction datehybrid. sample label typehybrid. sample control flagRNA source*

hybridization sampleEoT quantitationexpression of transcript

EoP quantitationexpression of protein

PE-PT pair

PEE idPEE type

transcript expression experiment

TEE idTEE type

protein expression experiment?

comprisesis comprised of

genus idgenus

species idspecies

sub-species idsub-speciesbe be

be be

be be

TE idtransmission event

vector host in TE

location idlocation

IE dateisolation event

HiL alive flaghost in location

LTC idLTC type value

location time componentLPC idLPC type value

location place component

LPC type nameLPC type

LTC type nameLTC type

LTC-LTC pair be objectobjectify

LTCP-preposition valueLTCP-preposition desc

LTC-pair preposition

LPC-LPC pair

LPCP-preposition valueLPCP-preposition desc

LPC-pair preposition

PiL alive flagpathogen in location

be

be

susceptibility idsusceptibility age*

susceptibility

be subjectsubjectify

bebe

be objectobjectify

be subjectsubjectify

CSC descriptioncomposite sample component

composite samplebebe

Share

Genbank NCBI Taxonomy

Stanford MicroarrayDB

NEDSS

Page 11: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

11

CASE STUDY: Staphylococcus aureus Research

Comparative Genomics

Part I: Sequencing, assembly and annotation

Page 12: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

12

Staphylococcus aureus: the bug

• Gram-positive cocci• family Micrococcae• grapelike clusters• yellow colonies• coagulase positive

• carried by 30-40% of healthy human adults

• septicemia • endocarditis • TOXIC SHOCK

SYNDROME• osteomyelitis • pneumonia• purpura fulminans• food poisoning• furuncules• impetigo• scalded skin syndrome• arthritis

Staphylococcus aureus infections

HUMANS

SA infects multiple hosts and causes many diseases

• MASTITIS• septicemia • toxic shock syndrome• pneumonia• osteomyelitis • snuffles• wound infection

ANIMALS

Page 13: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

13

• Metabolic diversity• Toxins• Immune effectors• Biofilms• Clumping• Adhesins• Regulators

Staphylococcus aureus: an ideal pathogen

GOAL: Identify the genomic differences between bovine and humanStaphylococcus aureus

HYPOTHESIS: Host-specific pathogenesis is enhanced by a subset of host-tailored virulence-related genes

Page 14: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

14

Why whole-genome sequencing?

Advantages:Complete set of potential genes

Virulence factorsVaccine componentsTherapeutic targets

Regulatory elementsGenomic organization

Challenges:Data managementTime expenseComplexity of conducting thorough analyses

Comparative genomicsmethods

Isolate plasmids and sequence the inserts

Generate small-insertgenomic library in E. coli

Close contig gapsAnnotate open reading frames

Assemble sequence reads

Hybridize fluorescently labeled SA DNA(multiple strains) to array, scan, analyze

Analyze amino acid substitution rates Compare genome content and organization

Spot 70mer oligonucleotides representingSA ORFs onto glass slides

Culture RF122, MSA553

….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..

….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..

Page 15: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

15

Strain: RF122 Mu50 N315 MW2

Size (bp): 2,703,713 2,878,040 2,814,816 2,820,462GC content: 33.4% 32.8% 32.8% 32.9%ORFs: 2,406 2,714 2,593 2,632No. Reads: 23,125 64,000 63,000 64,000

The Tool That Saved the Thesis

Page 16: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

16

S. aureus strain RF122 genome sequence

Size (nt) 2,742,531ORFs 2590GC% 32.78%rRNA 5tRNAs 60Plasmids 0Path Islands 2Phages 2Unique genes 60Pseudogenes 74

S. aureus strain MSA553 genome sequence

Staphylococcus aureus MSA553

Size (nt) 2,856,447ORFs 2702GC content 33%Ribosomal ops 5tRNAs 59Plasmids 1(int)Path Islands 2Phages 1Unique genes 25

Page 17: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

17

Isolate

Key Size (mbp) 2.74 2.86 2.90 2.80 2.82 2.81 2.8 7 2.81 2.82 Plasmids 0 0 1 1 1 0 0 1 0 ORFs 2590 2702 2671 2565 2632 2595 2697 UA UAIS1181 0 1 2 3 3 8 11 1 1

Mu50

N315

MW2

MSSA476

MRSA252

NCTC83

25

MSA553

COL

RF122

Isolate

Key Size (mbp) 2.74 2.86 2.90 2.80 2.82 2.81 2.87 2.81 2.82 Plasmids 0 0 1 1 1 0 0 1 0 ORFs 2589 2685 2671 2565 2632 2595 2697 UA UAIS1181 0 1 2 3 3 8 11 1 1

Mu50

N315

MW2

MSSA476

MRSA252

NCTC83

25

MSA553

COL

RF122

MSSA476

N315

Mu50

COL

NCTC8325

MRSA252RF122 *MSA553

MW2

Completely sequenced SA isolates for comparison

Staphylococcus aureusRF122

RF122 Genome Comparison

Page 18: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

18

MSA553 Genome Comparison

Staphylococcus aureusMSA553

C A T G C A A G T C G C C G T A T T

C A T G C G A G T C G C C A T A T TH A S R H I

H A S R R I

Gene analysis: amino acid substitution rates

1. Obtain RF122 sequence2. Line up raw RF122 sequence with sequence of human isolate3. Identify substitutions based on algorithm of Nei, Gojobori4. Calculate synonymous and nonsynonymous substitution rates per gene

synonymous nonsynonymous

Page 19: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

19

RF122 genes with elevated rates of nonsynonymous substit

PRODUCT FUNCTION

staphycoagulase adhesionfibronectin binding protein A adhesionhost factor binding protein adhesionhost factor binding protein adhesionhost factor binding protein adhesionclumping factor A adhesionsecreted von Willebrand adhesion

factor-binding protein precursor

IgG-binding protein immune evasion

staphylococcal enterotoxin 11 virulencestaphylococcal enterotoxin 9 virulence

hypothetical membrane protein (6) unknown

Identification of gene deletions

Staphylococcus aureusMSA553

Page 20: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

20

Ebh is not conserved in RF122

EMRSA-16

TSS

Mu50

N315

MW2

MSSA476

COL

OK8325

RF122

0 10K AA

Mobile element inserted within Ebh sequence

Ebh is a 1.1-megadalton cell wall protein capable of binding human fibronectin

High genome homology between MSA553 and MRSA252

A. phageMSA553 B. SaPI5 containing TSST-1 and Etx

C. SCCmec encoding methicillin resistance

D. phage Sa2

Page 21: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

21

Strain Type State Year TSSTMRSA etxa

MSA553 mTSS PA 1978 tsst1+b +

PSHA mTSS - 1978 + +

PSMN mTSS MN 1986 + +

PSPA mTSS TN 1986 + MRSA

PS58 mTSS CDC1980 + +

PSWH mTSS MN 2000 + MRSA

PSJO nmTSS MN 2005 +

PSEB nmTSS MN 2005 + +

PSHO nmTSS MN 2005 + +

PSLA nmTSS MN 2005 + +

a Presence of gene confirmed by PCR = +b Presence of gene confirmed; protein production unconfirmed

Novel exfoliative toxin detected in multiple SA isolated from

Equivalent gene content ≠ equivalent gene position

Page 22: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

22

Plug ‘N’ Play™Get the LATEST in Mobile Technology!

We carry the best in:

Adhesion factorsInvasion assistanceAntibiotic resistanceAltered regulatorsGeneral nuisances

and

TOXINS TOXINS TOXINS!

Sick of your current job? Alter your host specificity

with Plug ‘N’ Play HS!

Easiest Install!Just acquire and go!

Staph & Co.®

Since long ago

Comparative Genomics

Part II: Genomic DNA hybridization

Page 23: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

23

Comparative genomic DNA hybridization of diverse isolates

RF122 versus PSA1001 RF122 versus PSA72

SA oligonucleotide microarray contains 3800 probes corresponding to all of the genes from 9 sequenced SA genomes

Discovery: tools for CGH analysis are limited!

Most array tools designed for gene expression

CGH has a different set of challenges:

Normalization:Global won’t work on many comprehensive arraysHousekeeping set must be genetically conserved

Statistics:The concept is binary, but the reality isn’tEstablish cut-off for present, absent, divergent

Reliability: What if genes are locally divergent where probe hits?

Page 24: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

24

vSaB

ov

SCC

cap

BoPh

i12

SaPI

bov

OPT

rans

SaPI

bovB

eta

Ebh

BBBBBBBHHHHBBBBHH HHH

PSA1RF120RF122PSA72PSA6PSA10PSA13Mu50N315MW2MSA476PSA20PSA1001PSA17PSA4

NCTC8325MRSA252MSA553MSA553A

COL

Gene content of SA from isolated from humans and cow

Most successfulbovine isolates (ET3)

Comparative genomics summary

There are genetic differences between SA isolates routinely recovered from human and bovine hosts.

• Novel toxins

• Unique mobile genetic elements, genome organization

• Genes showing sequence divergence or deletionTranscriptional regulatorsMembrane proteinsAdhesion factorsToxins

Page 25: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

25

A smattering of DBs used for these analyses

Comparative genomics in the IDDB

Page 26: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

26

Comparative genomics in the IDDB

Functional genomics

Part I: Gene expression analysis

Page 27: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

27

Iron availability in the host

Toxic shock syndrome

bLactoferrin

Citrate

Mastitis

hLactoferrin (mTSS)TransferrinHemoglobin

Ferritin

Iron availability in the host

Mastitis Toxic shock syndrome

bLactoferrin

CitratehLactoferrin (mTSS)

TransferrinHemoglobin

Ferritin

Page 28: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

28

Metal metabolism of SA

Transcriptional regulator

RepressorInducer

Cytosolic protein

Membrane protein

Surface component

Divergent component

Non-iron primary function

Fur

Fhu .

Frp Sir Isd

PerR Zur

MntR

D E

C

B

D2

BA C

H

* FrpG = sortase B

F

G*

KatA

SrtB SrtA

AhpFC

FeoB

TrxBFtn

SstA

B

D

C

Mnt

A

B

C

Sbn .

I

DC

HB

A

H

G

F

E

D

MrgA

CzrA

Cad Czr

B

X

A

AdcA

? StbA

Iron studies in Staphylococcus aureus

Phenotypic response of bovine SA to iron limitation

CDM + iron

CDM – iron, log phase

CDM – iron, stationary phase

CDM iron added back (2 hours)

CDM + iron CDM - iron

Page 29: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

29

Gene expression analysis methods

Grow MSA553, RF122 in chemically defined media

Reverse transcribe to cDNA, construct fluorescently labeled probe

Scan array

Identify strain-specific iron-responsive genes for further genetic analyses

Isolate RNA at 5, 30, 60, 120 minutes

….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..

….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..Hybridize to

oligonucleotide array

Add lactoferrin or ferric citrate during mid-log phase

SAB0180 ldh1 L-lactate dehydrogenase 1 SAB0164 pflB formate acetyltransferase SAB1667 probable specificity determinant SAB0165 pflA formate acetyltransferase activating enzyme SAB2246 hypothetical protein SAB0557 adh1 alcohol dehydrogenase I SAB0242 probable formate/nitrite transport protein SAB2363 hypothetical protein SAB2267 narK nitrite extrusion protein SAB2491 anaerobic ribonucleotide reductase large subunit SAB2245 lldP1 L-lactate permease SAB2490 anaerobic ribonucleotide reductase small subunit SAB2029 czrA zinc and cobalt transport repressor protein SAB2030 czrB cation-efflux system membrane protein SAB2492 mntH probable magnesium citrate secondary transporter SAB2280 nasD nitrite reductase SAB0049 lldP2 L-lactate permease putative resolvase cadA cadmium exporting ATPase protein cadX cadmium efflux accessory protein SAB0209 rbsC ribose permease transport protein SAB0210 rbsD ribose transporter SAB1940 ilvN acetolactate synthase small subunit SAB1901 hypothetical protein SAB0072 sodM superoxide dismutase SAB0361 hypothetical bovine pathogenicity island protein SAB0332 probable nitro/flavin reductase SAB1219 probable DNA damage repair protein SAB0355 bovine pathogenicity island protein Orf9 SAB0354 bovine pathogenicity island protein Orf10 SAB0356 bovine pathogenicity island protein Orf8 SAB0358 bovine pathogenicity island protein Orf6 SAB0344 bovine pathogenicity island protein Orf19 SAB0235 hypothetical protein SAB0057 sbnC probable siderophore biosynthesis protein SAB2495 bsaA glutathione peroxidase SAB2346 oligopeptide transporter putative ATPase domain SAB2251 conserved hypothetical protein SAB2057 sirB ferrichrome ABC transporter SAB2153 hypothetical protein SAB2154 conserved hypothetical protein SAB0107 conserved hypothetical protein SAB0598 fhuD ferrichrome transport permease SAB0582 mntB cation ABC transporter SAB2296 gpmA 2,3-bisphosphoglycerate-dependent phosphoglycerate mutase SAB0583 mntA cation ATP-binding ABC transporter SAB2155 iunH inosine-uridine preferring nucleoside hydrolase SAB0597 fhuB ferrichrome transport permease SAB2156 fhuD2 probable ferrichrome-binding protein SAB2058 sirA ferrichrome ABC transporter lipoprotein SAB2056 sirC ferrichrome ABC transporter SAB0761 conserved hypothetical protein SAB0309 metB cystathionine gamma-synthase SAB0596 fhuA ferrichrome transport ATP-binding protein SAB0059 sbnE probable siderophore biosynthesis protein SAB0812 conserved hypothetical protein SAB2286 adcA probable zinc-binding lipoprotein SAB2345 oligopeptide transporter system protein SAB1651 hypothetical protein SAB2349 opp1B oligopeptide transporter putative membrane permease domain SAB2347 opp1D oligopeptide transporter putative ATPase domain SAB2351 oligopeptide transporter system protein SAB2353 oligopeptide transporter system protein SAB1316 fnbB fibronectin binding protein 2 domain SAB2352 oligopeptide transporter system protein SAB2350 opp1A oligopeptide transporter putative substrate binding domain SAB2348 opp1C oligopeptide transporter putative membrane permease domain

-12 0 +12 log(2) expression ratio

Lf-induced Both

Lf-induced MSA553

Fc-induced

Low-iron induced stronger Fc response

Low-iron induced matching RF122 Fc, Lf response

5 30 60 120 5 30 60 12 0 5 30 60

RF122 RF122 MSA553 Fc Lf Lf

A

B

C

D

E

A. Induced by bLfB. Induced by bLfC. Induced by FcD. Induced by iron starvationE. Induced by iron starvation

Strain and source-specifictranscriptional response clusters

RF1

22

F

erric

citr

ate

RF1

22

Lact

ofer

rin

MSA

553

Lact

ofer

rin

Page 30: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

30

Analysis of microarray data

•Global normalization•Filtering•Triplicate spots, duplicate experiments, dye-swap

2 biological replicates6 inputs per datapoint

•ClusteringHierarchical (Euclidian distance, avg. linkage, UPGMA)K-means (uncentered based on meas. distance)

•StatisticsSignificance Analysis for Microarrays (SAM)

Median-centered log ratios, one-class modelStringent delta value

Adhesion proteins DNA damage repairPurine biosynthesis

11

1

1033 14

21

12 10

12

0

91

0

Overlap of significantly* differentially expressed genesStrain and source-specific responses

RF122 Fc RF122 Lf

MSA553 Lf

RF122 Fc RF122 Lf

MSA553 Lf

* Significantly different across ALL timepoints by SAM analy

Page 31: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

31

10D2 fibronectin binding protein 2 domain mult.IJ21 ferrichrome ABC transporter SAB2056 sirCIJ22 ferrichrome ABC transporter SAB2057 sirBIJ23 ferrichrome ABC transport lipoprotein SAB2058 sirAIO22 conserved hypothetical protein SAB2153IO23 conserved hypothetical protein SAB2154IO24 conserved hypothetical protein SAB2155IP2 ferrichrome binding protein SAB2156 fhuD2IP3 conserved hypothetical protein SAB0107

2I22 probable antibiotic resistance protein SAB2345 2I23 oligopeptide transporter2I24 oligopeptide transporter ATPase domain SAB2347 opp1D2J1 oligopeptide transporter membrane dom. SAB2348 opp1C2J2 oligopeptide transporter membrane dom. SAB2349 opp1B2J3 oligopeptide transporter membrane dom. SAB2350 opp1A2J5 oligopeptide transporter SAB2351 2J6 oligopeptide transporter SAB2352 2J7 oligopeptide transporter SAB2353

4F19 cation ABC-transporter SAB0582 mntB4F21 cation ATP-binding ABC transporter

SAB0583 mntA4G14 ferrichrome transport ATP-binding trans. SAB0596 fhuA4G15 ferrichrome transporter permease SAB0597 fhuB4G16 ferrichrome transporter permease SAB0598 fhuG4G21 siderophore biosynthesis protein SAB0059 sbnE

Genes upregulated in iron-deplete conditionsRFc RLf MLf

gataatgataatcattatc E. coli consensusgataatgataatcattatc B. subtilis dhb gataatgattctcattgtc S. aureus sirAgttcatgataatcattatc S. aureus fhucattgcacctttcattatc S. aureus opp1Xtatcgtatcattcattatc S. aureus opp1Ztttaatttccttcattatc S. aureus opuCC

Signal sequence for iron-regulated genes

1B11 SaPIbov protein SAB03581B13 SaPIbov protein SAB03561B14 SaPIbov protein SAB03551B15 SaPIbov protein SAB03542J21 hypothetical protein SAB02353A19 glutathione peroxidase SAB24953J16 nitro/flavin reductase SAB03323K8 SaPIbov protein SAB03443K16 SaPIbov protein SAB03614F20 siderophore biosynthesis protein SAB00574N11 superoxide dismutase [Mn/Fe]SAB00725K18 DNA damage repair protein SAB1219

Genes upregulated in response to ferric citrate but not lactof

Summary

Ferric citrate, but not lactoferrin, induces antioxidant response and increased transcription of SaPIbov pathogenicity island genes

RFc RLf MLf

Page 32: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

32

2D14 L-lactate permease SAB22452D16 hypothetical protein SAB22462E19 nitrite extrusion protein SAB22672F10 nitrite reductase SAB22802J22 hypothetical protein SAB23633A12 anaerobic ribonucleotide reductase SAB24903A15 anaerobic ribonucleotide reductase SAB24913A16 probable Mg2+ transporter SAB24924A23 L-lactate permease SAB00494E13 alcohol dehydrogenase I SAB05577B13 formate acetyltransferase SAB01647B24 formate acetyltransferase activating enz.SAB01657C1 specificity determinant SAB16677J5 L-lactate dehydrogenase SAB0180

Genes upregulated in response to lactoferrin but not ferric c

Summary

Lactoferrin, but not ferric citrate, induces transcription of fermentation and anaerobic respiration system components

RFc RLf MLf

ummary of SA response to iron depletion and specific iron sour

• Multiple iron transport systems are significantly upregulated in low iron

• Steady metabolic/cellular gene expressionEmphasizes ability of SA to withstand iron depletion

• New iron-regulated transport operon (Opp) and Fur signal sequence

• Ferric citrate induces antioxidant response and increased transcription of pathogenicity island genes

•Lactoferrin induces transcription of fermentation and anaerobic respiration system components

Page 33: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

33

Functional genomics

Part II: Proteomics

Fur

Fhu .

Frp Sir Isd

PerR Zur

MntR

D E

C

B

D2

BA C

H

* FrpG = sortase B

F

G*

KatA

SrtB SrtA

AhpFC

FeoB

TrxBFtn

SstA

B

D

C

Mnt

A

B

C

Sbn .

I

DC

HB

A

H

G

F

E

D

MrgA

CzrA

Cad Czr

B

X

A

AdcA

? StbA

Page 34: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

34

Gene expression analysis methods

Grow S. aureus in rich media (Fe+) and rich media + iron chelator (Fe-)

Extract and purify membrane proteins

Run MASCOT search to identify corresponding protein

Separate membrane proteins by SDS-PAGE

Use MALDI-TOF to identify peptide mass fingerprint

Extract membrane proteins

++++

500 1000 1500 2000 2500 3000 m/z

2000

4000

6000

8000

10000

12000

14000

a.i.

/I=/jan04/lw129a1/SRef/pdata/1 Administrator Fri Mar 5 10:15:24 2004

Page 35: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

35

SAAV1 19636 1477 2176

+Fe -Fe +Fe -Fe +Fe -Fe +Fe -Fe

++ ++)

M.W. STD

SA membrane proteins induced during iron-restriction

500 1000 1500 2000 2500 3000 m/z

2000

4000

6000

8000

10000

12000

14000

a.i.

/I=/jan04/lw129a1/SRef/pdata/1 Administrator Fri Mar 5 10:15:24 2004

67%158 (p= 2.2 x 10-10)25329962SirA

Peptide coverage

Mowse scoreGI#Protein match

67%158 (p= 2.2 x 10-10)25329962SirA

Peptide coverage

Mowse scoreGI#Protein match

A B

C

Identification of SA membrane proteins

Page 36: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

36

S. aureus proteins upregulated under iron restrictionmatch gene expression study

ldh L-lactate dehydrogenase IpflB formate acetyltransferasepflA formate acetyltransferase act. enzymeadh1 alcohol dehydrogenase I

SAV2177 ferrichrome ABC transporter homologSAR1869 putative exported proteinsirA iron-regulated lipoproteinSACOL0688 ABC transporter homologopp1-A oligopeptide transporter

-12 0 +12 log(2) expression ratio

5 30 60 120 5 30 60 12 0 5 30 60

RF122 RF122 MSA553 Fc Lf Lf

Functional genomics summary

• Identified genes and operons not previously associated with iron metabolism in S. aureus

• Identified strain-specific differences in iron metabolism between a bovine and human isolate of SA

• Identified conserved iron-induced membrane proteins

FUTURE WORK

• Evaluate vaccine potential of membrane proteins•Different compositions for humans and bovines?

• Confirm regulation and function of newly identified genes

MORE DATA

Page 37: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

37

Update on what we have in local databases

• Genomic sequences (2 full in-house; 7 outside sequences)• Full sequence (Total of 25.2 million nucleotides)

• Annotated genes (aa and nt)

• 60,000 clones catalogued (id, location, date, sequence)

• Sequence similarity reports for each gene (aa and nt versus 2x8

targets)

• Substitution rates for each gene (2x8 targets)

• Microarray hybridization data for 3800 genes in12 strains (6 reps)

• Microarray based expression data for 3800 genes in 3 strains under 5

different environmental conditions (6 reps)

• Protein gel data for 3200 proteins under 2 conditions, 2 strains>

… actually, a great deal of this is stored in Excel spreadsheets.

>1,000,000 data points

Utilizing the IDDB for S. aureus studies

Page 38: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

38

Host-specialized adaptations mined from comparative genomic and functional analysis

Human TSS

NovelExfoliative toxin XSaPI5

DivergentMembrane proteinsLytR regulatorExotoxin 3Staphopain protease

PseudogenesHla

Bovine mastitis

NovelStreptolysinsSaPIbov genes

DivergentAdhesinsAgr locus

PseudogenesEbhSpaClfASdrCSstC

Are we any closer to a vaccine?

Page 39: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

39

Iron-induced proteins are cross-protective against SA challenge in mice

0

5

10

15

20

25

30

placebo vaccinated

Lesi

on d

iam

eter

(mm

)

A B

0 25 50 75 100 125 150 175 2000

102030405060708090

100110

19636 vaccinated

placebo

Time (hrs)

Perc

ent s

urvi

val

C D

0

5

10

15

20

25

30

placebo 19636-vaccinated

Lesi

on d

iam

eter

(mm

)

0 25 50 75 100 125 150 175 2000

102030405060708090

100110

1477 vaccinated

placebo

Time (hrs)

Perc

ent s

urvi

val

Homologous challenge

Heterologous challenge

p = 0.020

p = 0.012p = 0.015

p = 0.020

Mortality model Lesion model

Messages

•Existing tools are good.

•Databases•Algorithms•Associated software tools

•More/better tools are continually needed.

•Even if not necessarily requested…•Even if not necessarily requested in clear language…•Even if the current tools work…•Even if the newest version was launched an hour ago…

Page 40: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

40

Conclusions

Biologist

B_id

Computer Scientist

CSci_idbe

be

Nice, but not likely, nor necessary

Conclusions

Biologist

B_id

Imperative!

ComputerScientistCSci_id

Bio-CSci Pair

Good communication flag

Page 41: Data Vaccination - University of Minnesotakumar/cbcb/powerpoint/lisa.pdfComputational tools in infectious disease research • ID database to include molecular biology • CASE STUDY:

41

Acknowledgements

Vivek KapurRajit Chakravarty

Dan WolfAkash Kumar

Advanced Genetic Analysis Center

Computational Analysis ProjectsNick BollwegJohn CarlisChris Dwan

John Freeman (3M)Wayne Xu

SYNTIRONLaura Wonderling

Daryll Emery

CollaboratorsJames Musser JR Fitzgerald