13
SEPSIS Robust classification of bacterial and viral infections via integrated host gene expression diagnostics Timothy E. Sweeney, 1,2 * Hector R. Wong, 3,4 Purvesh Khatri 1,2 * Improved diagnostics for acute infections could decrease morbidity and mortality by increasing early antibiotics for patients with bacterial infections and reducing unnecessary antibiotics for patients without bacterial infections. Sev- eral groups have used gene expression microarrays to build classifiers for acute infections, but these have been hampered by the size of the gene sets, use of overfit models, or lack of independent validation. We used multicohort analysis to derive a set of seven genes for robust discrimination of bacterial and viral infections, which we then vali- dated in 30 independent cohorts. We next used our previously published 11-gene Sepsis MetaScore together with the new bacterial/viral classifier to build an integrated antibiotics decision model. In a pooled analysis of 1057 samples from 20 cohorts (excluding infants), the integrated antibiotics decision model had a sensitivity and specificity for bac- terial infections of 94.0 and 59.8%, respectively (negative likelihood ratio, 0.10). Prospective clinical validation will be needed before these findings are implemented for patient care. INTRODUCTION Early and accurate diagnosis of infection is key to improving patient outcomes and reducing antibiotic resistance. The mortality rate of bac- terial sepsis increases by 8% for each hour by which antibiotics are delayed (1); however, indiscriminate prescription of antibiotics to pa- tients without bacterial infections increases rates of morbidity and anti- microbial resistance. The rate of inappropriate antibiotic prescriptions in the hospital setting is estimated at 30 to 50% and would be decreased by improved diagnostics ( 2, 3). In broader use, up to 95% of outpatients given antibiotics for suspected enteric fever have negative cultures (4). There is currently no gold standard point-of-care diagnostic that can broadly deter- mine the presence and type of infection. Thus, the White House has established the National Action Plan for Combating Antibiotic-Resistant Bacteria, which called for point-of-need diagnostic tests to distinguish rapidly between bacterial and viral infections( 5). Although new polymerase chain reaction (PCR)based molecular diagnostics can profile pathogens directly from a blood culture (6), such methods rely on the presence of adequate numbers of pathogens in the blood. Moreover, they are limited to detecting a discrete range of patho- gens. As a result, there is a growing need for molecular diagnostics that profile the host gene response. These include diagnostics that can dis- tinguish the presence of infection as compared to inflammation in the absence of infection, such as our 11-gene Sepsis MetaScore(SMS) (7), which has been validated across multiple cohorts ( 8), among others ( 9, 10). Other groups have focused on gene sets that can distinguish between types of infections, such as bacterial versus viral infections (1113); however, these gene sets often contain too many genes to translate into a useful clinical tool. Tsalik et al. described a model that distinguishes among all three groupsnoninfected patients and those with bacterial or viral illnessbut this model required the measurement of 122 probes, presenting an implementation challenge (14). Similarly, we have de- scribed a meta-virus signaturethat describes a common response to viral infection but contained too many genes (396) for clinical application (15). Overall, although great promise has been shown in this field, no pragmatic infection diagnostic based on host gene expression has yet made it into clinical practice. The data from these biomarker studies and dozens of other genome- wide expression studies in sepsis and acute infections have been pub- lished and deposited for further study in public databases such as the National Institutes of Health (NIH) Gene Expression Omnibus (GEO) and the European Bioinformatics Institute (EBI) ArrayExpress. These data are a largely untapped resource that can be used for both biomarker discovery and validation. We have previously shown that our integrated multicohort analysis of gene expression produces robust diagnostic tools for organ transplant ( 16), sepsis (7), specific types of viral infections (15), and active tuberculosis ( 17). Furthermore, these data are also useful as a benchmarking and validation tool for new host gene expression diag- nostics. However, such validation using public data has previously been limited to only those cohorts that contain at least two classes of interest (those in which a direct comparison between classes is possible), because interstudy technical differences preclude direct comparison of diagnostic scores between cohorts. Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral infections. Thus, to derive an improved biomarker for discriminating infection types, we applied our multicohort analysis framework to clinical microarray co- horts that compared the host response to bacterial and viral infections. We further developed a method to conormalize gene expression data among multiple cohorts, allowing direct comparison of a diagnostic score among multiple cohorts. Finally, we combined the previous SMS and the bacterial/viral diagnostic described here into an integrated antibiotics decision model (IADM) that can determine whether a patient with acute inflammation has an underlying bacterial infection. RESULTS Derivation of the seven-gene bacterial/viral metascore Our previously published 11-gene SMS cannot reliably distinguish be- tween bacterial and viral infections, showing mostly nonsignificant differences in score distribution between patients with bacterial and 1 Stanford Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA. 2 Biomedical Informatics Research, Stan- ford University School of Medicine, Stanford, CA 94305, USA. 3 Division of Critical Care Medicine, Cincinnati Childrens Hospital Medical Center and Cincinnati Childrens Re- search Foundation, Cincinnati, OH 45223, USA. 4 Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA. *Corresponding author. Email: [email protected] (T.E.S.); [email protected] (P.K.) RESEARCH ARTICLE www.ScienceTranslationalMedicine.org 6 July 2016 Vol 8 Issue 346 346ra91 1 by guest on July 17, 2020 http://stm.sciencemag.org/ Downloaded from

Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

SEPS I S

Do

Robust classification of bacterial and viral infections viaintegrated host gene expression diagnosticsTimothy E. Sweeney,1,2* Hector R. Wong,3,4 Purvesh Khatri1,2*

Improved diagnostics for acute infections could decrease morbidity and mortality by increasing early antibiotics forpatients with bacterial infections and reducing unnecessary antibiotics for patients without bacterial infections. Sev-eral groups have used gene expression microarrays to build classifiers for acute infections, but these have beenhampered by the size of the gene sets, use of overfit models, or lack of independent validation. We used multicohortanalysis to derive a set of seven genes for robust discrimination of bacterial and viral infections, which we then vali-dated in 30 independent cohorts.We next used our previously published 11-gene SepsisMetaScore togetherwith thenew bacterial/viral classifier to build an integrated antibiotics decision model. In a pooled analysis of 1057 samplesfrom 20 cohorts (excluding infants), the integrated antibiotics decisionmodel had a sensitivity and specificity for bac-terial infections of 94.0 and 59.8%, respectively (negative likelihood ratio, 0.10). Prospective clinical validation will beneeded before these findings are implemented for patient care.

wn

by guest on July 17, 2020http://stm

.sciencemag.org/

loaded from

INTRODUCTION

Early and accurate diagnosis of infection is key to improving patientoutcomes and reducing antibiotic resistance. The mortality rate of bac-terial sepsis increases by 8% for each hour by which antibiotics aredelayed (1); however, indiscriminate prescription of antibiotics to pa-tients without bacterial infections increases rates of morbidity and anti-microbial resistance. The rate of inappropriate antibiotic prescriptions inthe hospital setting is estimated at 30 to 50% and would be decreased byimproved diagnostics (2, 3). In broader use, up to 95% of outpatients givenantibiotics for suspected enteric fever have negative cultures (4). There iscurrently no gold standard point-of-care diagnostic that can broadly deter-mine the presence and type of infection. Thus, the White House hasestablished the National Action Plan for Combating Antibiotic-ResistantBacteria, which called for “point-of-need diagnostic tests to distinguishrapidly between bacterial and viral infections” (5).

Although new polymerase chain reaction (PCR)–based moleculardiagnostics can profile pathogens directly from a blood culture (6), suchmethods rely on the presence of adequate numbers of pathogens in theblood. Moreover, they are limited to detecting a discrete range of patho-gens. As a result, there is a growing need for molecular diagnostics thatprofile the host gene response. These include diagnostics that can dis-tinguish the presence of infection as compared to inflammation in theabsence of infection, such as our 11-gene “Sepsis MetaScore” (SMS) (7),which has been validated across multiple cohorts (8), among others (9, 10).Other groups have focused on gene sets that can distinguish betweentypes of infections, such as bacterial versus viral infections (11–13);however, these gene sets often contain too many genes to translate intoa useful clinical tool. Tsalik et al. described a model that distinguishesamong all three groups—noninfected patients and those with bacterialor viral illness—but this model required the measurement of 122 probes,presenting an implementation challenge (14). Similarly, we have de-scribed a “meta-virus signature” that describes a common response to

1Stanford Institute for Immunity, Transplantation and Infection, Stanford UniversitySchool of Medicine, Stanford, CA 94305, USA. 2Biomedical Informatics Research, Stan-ford University School of Medicine, Stanford, CA 94305, USA. 3Division of Critical CareMedicine, Cincinnati Children’s Hospital Medical Center and Cincinnati Children’s Re-search Foundation, Cincinnati, OH 45223, USA. 4Department of Pediatrics, Universityof Cincinnati College of Medicine, Cincinnati, OH 45267, USA.*Corresponding author. Email: [email protected] (T.E.S.); [email protected] (P.K.)

www.S

viral infection but contained too many genes (396) for clinical application(15). Overall, although great promise has been shown in this field, nopragmatic infection diagnostic based on host gene expression has yetmade it into clinical practice.

The data from these biomarker studies and dozens of other genome-wide expression studies in sepsis and acute infections have been pub-lished and deposited for further study in public databases such as theNational Institutes of Health (NIH) Gene Expression Omnibus (GEO)and the European Bioinformatics Institute (EBI) ArrayExpress. Thesedata are a largely untapped resource that can be used for both biomarkerdiscovery and validation. We have previously shown that our integratedmulticohort analysis of gene expression produces robust diagnostic toolsfor organ transplant (16), sepsis (7), specific types of viral infections (15),and active tuberculosis (17). Furthermore, these data are also useful as abenchmarking and validation tool for new host gene expression diag-nostics. However, such validation using public data has previously beenlimited to only those cohorts that contain at least two classes of interest(those in which a direct comparison between classes is possible), becauseinterstudy technical differences preclude direct comparison of diagnosticscores between cohorts.

Here, we sought to improve the diagnostic power of the SMS byadding the ability to discriminate bacterial from viral infections. Thus,to derive an improved biomarker for discriminating infection types, weapplied our multicohort analysis framework to clinical microarray co-horts that compared the host response to bacterial and viral infections.We further developed a method to conormalize gene expression dataamong multiple cohorts, allowing direct comparison of a diagnostic scoreamong multiple cohorts. Finally, we combined the previous SMS andthe bacterial/viral diagnostic described here into an integrated antibioticsdecision model (IADM) that can determine whether a patient withacute inflammation has an underlying bacterial infection.

RESULTS

Derivation of the seven-gene bacterial/viral metascoreOur previously published 11-gene SMS cannot reliably distinguish be-tween bacterial and viral infections, showing mostly nonsignificantdifferences in score distribution between patients with bacterial and

cienceTranslationalMedicine.org 6 July 2016 Vol 8 Issue 346 346ra91 1

Page 2: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

viral infections (fig. S1). Having previously shown that there is a con-served host gene response to viral infections (15), we hypothesized thata classifier for bacterial versus viral infections would allow for an im-proved diagnostic model. We thus performed a systematic search for geneexpression microarray cohorts that studied patients with viral and/orbacterial infections. We identified eight cohorts (11, 18–26) [both wholeblood and peripheral blood mononuclear cells (PBMCs)] that includedn > 5 patients with both viral and bacterial infections (Table 1A). Theeight cohorts were composed of 426 patient samples (142 viral and284 bacterial infections), including children and adults, medical andsurgical patients, and those with multiple sites of infection. We performedmulticohort analysis on the eight cohorts as described previously (fig. S2)

www.S

(7, 15–17). We set significance thresholds at an effect size >2-fold anda false discovery rate (FDR) <1% in leave–one–data set–out round-robinanalysis. However, to make sure that neither tissue type (whole blood orPBMCs) biased our results, we further selected only those genes that alsohad an effect size >1.5-fold in separate analyses of both PBMC andwhole-blood cohorts. This process resulted in 72 differentially expressedgenes significant at the above thresholds (table S1). We used a greedyforward search (7) to find a gene set optimized for diagnosis, resultingin seven genes [higher in viral infections (IFI27, JUP, and LAX1) andhigher in bacterial infections (HK3, TNIP1, GPAA1, and CTSB); fig. S3)].As expected, a “bacterial/viral metascore” based on these seven genesrobustly distinguished viral from bacterial infections in all eight of the

Dow

Table 1. Data sets used in the discovery and direct validation of the bacterial/viral metascore. CAP, community-acquired pneumonia; HHV6,human herpesvirus 6; RSV, respiratory syncytial virus; HSV, herpes simplex virus; CMV, cytomegalovirus; MPV, metapneumovirus; PICU, pediatricintensive care unit; LRTI, lower respiratory tract infection; ARI, acute respiratory infection; COPD, chronic obstructive pulmonary disease.

nloaded

Accession

Author Tissue Platform Demographic

Bacteria

cienceTranslationalMe

Viruses

dicine.org 6 July 201

Numberhealthy

6 Vol 8 Iss

Numberbacterial

ue 346 346r

Numberviral

from

A. Discovery data sets

http

GSE6269

Ramilo PBMC GPL2507

://stm.s

Childrenadmittedwith

infection

cie

Escherichia coli,Staphylococcus

aureus,Streptococcuspneumoniae

Influenza

0 16 8

nce

GPL570

ma

S. aureus,S. pneumoniae

Influenza

0 12 10

g.o

GPL96

rg

S. aureus,S. pneumoniae

Influenza

6 73 18

by /

GSE20346

Parnell Whole blood GPL6947

gu

Adults withCAP

es

Unknownbacterial

pneumonia

Influenza

36 12 8

t on

GSE40012 Parnell Whole blood GPL6947

Ju

Adults withCAP

Unknown bacterialpneumonia

Influenza

18 36 11

ly 1

GSE40396 Hu Whole blood GPL10558

7, 2020

Febrilechildren inemergencydepartment

Multiple

Adenovirus,enterovirus,rhinovirus,

HHV6

22

8 35

GSE42026

Herbeg Whole blood GPL6947 Children admittedwith infection

Streptococcus andStaphylococcus spp.

Influenza, RSV

33 18 41

GSE66099

Wong Whole blood GPL570 Septic childrenin PICU

Multiple

Influenza, HSV,CMV, BK,adenovirus

47

109 11

B. Validation data sets

GSE15297

Popper Whole blood GPL8328 Febrilechildren

Scarlet fever(Streptococcus)

Adenovirus

0 5 8

GSE25504

Smith Whole blood GPL13667 Septicneonates

Multiple

Rhinovirus,CMV

6

11 3

GPL6947

Multiple CMV 35 26 1

GSE60244

Suarez Whole blood GPL10558 Adults hospitalizedwith LRTI

Gram-positive and atypical

Influenza, RSV, MPV 40 22 71

GSE63990

Tsalik Whole blood GPL571 Adults with ARI Multiple Multiple 0 70 115

E-MEXP-3589

Almansa Whole blood GPL10332 Adults withCOPD withinfection

Gram-positive,Gram-negative,

atypical

Influenza, RSV, MPV

4 4 5

a91 2

Page 3: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

Table 2. Validation data sets that matched inclusion criteria and have a single known pathogen type (viral or bacterial). DHF, denguehemorrhagic fever; DSS, dengue shock syndrome.

Accession

Author Tissue Platform Demographic

www.ScienceTranslat

Specificpathogens

ionalMedicine.org 6 July 201

Numberhealthy

6 Vol 8 Is

Numberbacterial

sue 346 346r

Numberviral

E-MEXP-3567

Irwin Wholeblood

GPL96

Malawian children withbacterial meningitis or

pneumonia

S. pneumoniae,Neisseria meningitidis,

or Haemophilusinfluenzae

3

12 0

GSE11755

Emonts Wholeblood

GPL570

Children in PICU withmeningococcal sepsis

N. meningitidis

3 6 0

GSE13015

Pankla

D

Wholeblood

GPL6106GPL6947

Adults with bacterial sepsis

Burkholderia pseudomalleiand others

1010

4515

00

own

GSE22098 Berry

lo

Wholeblood

GPL6947

Children with Gram-positiveinfections

Staphylococcus andStreptococcus

81

52 0

ade

GSE28750 Sutherland

d f

Wholeblood

GPL570

Adults with community-acquiredbacterial sepsis

Multiple bacteria

20 10 0

rom

GSE29161 Thuny

ht

Wholeblood

GPL6480

Adults with native valve-infectedendocarditis

Staphylococcus andStreptococcus

5

5 0

tp://

GSE33341 Ahn

stm

Wholeblood

GPl571

Adults with septicbloodstream infections

S. aureus or E. coli

43 51 0

.sc

GSE40586 Lill

ien

Wholeblood

GPL6244

Bacterial meningitis Multiple bacteria 18 21 0

cem

GSE42834 Bloom

a

Wholeblood

GPL10558

Bacterial pneumonia Unknown 118 19 0

g.or

GSE57065 Cazalis

g/

Wholeblood

GPL570

Adults with bacterialseptic shock

Multiple bacteria

25 82 0

by g

GSE69528 Conejero

ue

Wholeblood

GPL10558

Adults withbacterial sepsis

B. pseudomallei and others

55 83 0

st o

E-MTAB-3162 van de Weg

n J

Wholeblood

GPL570

Indonesian patients >14 years old withuncomplicated and severe dengue

Dengue

15 0 30

uly

GSE17156 Zaas

17

Wholeblood

GPL571

Volunteers with viralchallenge peak symptoms

Influenza, RSV, rhinovirus

56 0 27

, 20

GSE21802 Bermejo-Martin

20

Wholeblood

GPL6102

Adults withseptic influenza

Influenza (H1N1)

4 0 12

GSE27131

Berdal Wholeblood

GPL6244

Adults with septic influenza withmechanical ventilation

Influenza (H1N1)

7 0 7

GSE38900

Mejias Wholeblood

GPL10558GPL6884

Children with acute LRTI

RSVInfluenza, RSV, rhinovirus

831

00

28153

GSE51808

Kwissa Wholeblood

GPL13158

Children and adults withuncomplicated dengue

and DHF

Dengue

9 0 28

GSE68310

Zhai Wholeblood

GPL10558

Adults with ARIs Mostly influenzaand rhinovirus

243

0 211

GSE16129

Ardura PBMC GPL6106GPL96

Children with invasivestaph infections

S. aureus

910

946

00

GSE23140

Liu PBMC GPL6254 Children with acute otitis media S. pneumoniae 4 4 0

GSE34205

Ioannidis PBMC GPL570 Infants and children with ARIs Influenza, RSV 22 0 79

GSE38246

Popper PBMC GPL15615 Nicaraguan children with un-complicated dengue, DHF, and DSS

Dengue

8 0 95

GSE69606

Brand PBMC GPL570 Children with mild-to-severe RSV RSV 17 0 26

a91 3

Page 4: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

by guest on Juhttp://stm

.sciencemag.org/

Dow

nloaded from

discovery cohorts [summary receiver operating characteristic (ROC) areaunder the curve (AUC), 0.97; 95% confidence interval (CI), 0.89 to0.99; Fig. 1 and fig. S4].

We next tested the seven-gene set in the six remaining independentclinical cohorts (13, 14, 27–29) that directly compared bacterial and viralinfections (138 bacterial and 203 viral infections, totaling 341 samples)and found a summary ROC AUC of 0.91 (95% CI, 0.82 to 0.96) (Fig. 1,Table 1B, and fig. S5; individual test characteristics in table S2). Tomeasure the generalizability of our signature, we also tested whethercells stimulated in vitro with lipopolysaccharide (LPS) or influenzavirus could be separated with the bacterial/viral metascore [GSE53166(30), n = 75; AUC, 0.99; fig. S6].

Global validation via COCONUT conormalizationThere are dozens of microarray cohorts in the public domain that studiedeither bacterial or viral infections, but not both, thus precluding a direct(within data set) estimate of diagnostic power for separating bacterialand viral illness. To apply and compare a gene score across these cohorts,we needed a method that could remove inter–data set batch effects whileremaining unbiased to the diagnosis of the diseased patients. We de-signed and implemented a modified type of array normalization that usesthe ComBat (31) empirical Bayes normalization methods on healthy con-trols to obtain bias-free corrections of disease samples (a method we callCOmbat CO-Normalization Using conTrols or “COCONUT”; fig. S7).Housekeeping genes remained invariant across both diseases and cohortsafter COCONUT conormalization, and each gene still retained the samedistribution between diseases and controls within each data set (fig. S8).Because our method assumes that all healthy samples are derived fromthe same distribution, we separately normalized the whole-blood andPBMC samples because different immune cell types have differentbaseline gene expression distributions. Using COCONUT conormaliza-tion, the bacterial/viral metascore has a global AUC of 0.92 (95% CI,0.89 to 0.96) in the whole-blood discovery cohorts (figs. S9 to S10). Wethen applied this method to test the bacterial/viral metascore in all publicdomain microarray cohorts that matched inclusion criteria and usedwhole blood. These data sets included the four direct validation cohorts

www.S

that included control patients and an additional 20 cohorts that mea-sured either bacterial or viral infections but not both (n = 143 + 897 =1040) (32–49). These data sets represent a wide variety of clinical con-ditions, including a range of infection types (Gram-positive, Gram-negative,atypical bacterial, common respiratory viruses, and dengue) and severities(mild infections to septic shock). The bacterial/viral metascore showedan overall ROC AUC of 0.93 (95% CI, 0.91 to 0.94) across these data,which allowed us to set a single global score cutoff (Fig. 2, Table 1B, andfig. S11). Finally, we performed the same procedure on PBMC validationcohorts [six cohorts (50–54), n = 259; global AUC, 0.92 (95% CI, 0.87to 0.97; figs. S12 to S13)]. All three global ROC AUCs using COCO-NUT conormalization (discovery whole blood, 0.92; validation wholeblood, 0.93; validation PBMCs, 0.92) approximately matched the summa-ry AUC of the direct validation cohorts (0.91), giving high confidence inthe diagnostic power of this method.

Integrated antibiotics decision modelA key clinical need is diagnosing whether a patient with signs andsymptoms of inflammation has an underlying bacterial infection, be-cause rapid and judicious administration of antibiotics is key to im-proving patient outcomes. Neither the SMS nor the bacterial/viralmetascore alone can robustly distinguish between all three classes of(i) noninfected inflammation, (ii) bacterial illness, and (iii) viral illness.Thus, to increase clinical relevance, we developed an “integrated anti-biotics decision model” (IADM), whereby we first apply our previouslydescribed SMS (7) to test for the presence of an infection and then applythe bacterial/viral metascore to the samples that test positive for infec-tion (Fig. 3A). As described previously, the only way to establish testcharacteristics for the IADM simultaneously across cohorts is to useCOCONUT conormalization. We found that the SMS in COCONUT-conormalized data is strongly influenced by age, which could becaused by age-dependent differences in healthy subjects, infected pa-tients, or both (fig. S14). We thus excluded cohorts focused on infants(children <1 year old) from the IADM. We also removed the low-severity outpatient viral illness cohorts (GSE17156 and GSE68310)because in outpatient settings, the history and physical exam findings

cienceTranslationalMedicine.org

ly 17, 2020

make noninfectious causes of acute in-flammation less likely. This resulted in atotal of 20 cohorts for testing the IADM(n = 1057). The resulting global AUC forthe SMS across the available data was 0.86(95%CI, 0.84 to 0.89) (fig. S15 and table S3).We set global thresholds for an SMS sen-sitivity for infection of 95% and a bacterial/viral metascore sensitivity for bacterialinfection of 95%. Considering all threeclasses of noninfectious inflammation,bacterial infection, and viral infection,this yielded an overall sensitivity andspecificity of 94.0 and 59.8% for bacterialinfections and 53.0 and 90.6% for viralinfections, respectively (Fig. 3). Theseperformance characteristics were largelyunchanged if healthy patients were in-cluded in the noninfected class (fig. S16).The overall positive and negative likeli-hood ratios for bacterial infection in theIADMwere thus 2.34 (LR+) and 0.10 (LR−),

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00False-positive rate (1−specificity)

True

pos

itive

rat

e (s

ensi

tivity

)

GSE6269 gpl2507 AUC = 0.99 (95% CI 0.97–1)GSE6269 gpl570 AUC = 1 (95% CI 1–1)GSE6269 gpl570 AUC = 0.98 (95% CI 0.95–1)GSE20346 AUC = 1 (95% CI 1–1)GSE40012 AUC = 1 (95% CI 1–1)GSE40396 AUC = 0.88 (95% CI 0.83–0.93)GSE42026 AUC = 0.94 (95% CI 0.91–0.97) GSE66099 AUC = 0.84 (95% CI 0.77–0.92)Summary AUC = 0.97 (95% CI 0.89–0.99)

Bacterial versus viral, Discovery

0.00 0.25 0.50 0.75 1.00False-positive rate (1−specificity)

GSE15297 AUC = 0.88 (95% CI 0.78–0.97)GSE25504 gpl13667 AUC = 0.94 (95% CI 0.84–1)GSE25504 gpl6947 AUC = 1 (95% CI 1–1)GSE60244 AUC = 0.92 (95% CI 0.89–0.94)GSE63990 AUC = 0.89 (95% CI 0.87–0.91)E-MEXP-3589 AUC = 0.9 (95% CI 0.79–1)Summary AUC = 0.91 (95% CI 0.82–0.96)

Bacterial versus viral, Validation

Fig. 1. Summary ROC curves for discovery anddirect validationdata sets for thebacterial/viralmeta-score. Summary ROC curve is shown in black, with 95% CIs in dark gray.

6 July 2016 Vol 8 Issue 346 346ra91 4

Page 5: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

by guest on July 17, 2020http://stm

.sciencemag.org/

Dow

nloaded from

−10

−8

−6

−4

−2

0

Sco

re

Validation whole blood, COCONUT-conormalized global AUC = 0.93 (95% CI 0.91–0.94)

1

2

−−

3

4

5

6

7

8

9

10

11

12

13

14

15

16−

17

18

19

20

21

22

23

24

46

810

1214

16

Gen

e E

xp.

35

7

Gen

e E

xp.

POLGATP6V1B1PEG10

POLGATP6V1B1PEG10

IFI27JUPLAX1HK3TNIP1GPAA1CTSB

Two-class data sets Bacterial infection data sets Viral infection data sets

Bacterial infection

Viral infection

1E

-ME

XP

-3589 gpl103322

GS

E25504 gpl13667

3

GS

E25504 gpl6947

4

GS

E60244 gpl10558

5

EM

EX

P3567 gpl96

6

GS

E11755 gpl570

7

GS

E13015 gpl6106

8

GS

E13015 gpl6947

9

GS

E22098 gpl6947

10

GS

E28750 gpl570

11

GS

E29161 gpl6480

12

GS

E33341 gpl571

13

GS

E40586 gpl6244

14

GS

E42834 gpl10558

15

GS

E57065 gpl570

16

GS

E69528 gpl10558

17

EM

TAB

3162 gpl570

18

GS

E17156 gpl571

19

GS

E21802 gpl6102

20

GS

E27131 gpl6244

21

GS

E38900 gpl10558

22

GS

E38900 gpl6884

23

GS

E51808 gpl13158

24

GS

E68310 gpl10558

Fig. 2. Bacterial/viral score in COCONUT-conormalizedwhole-blood validation data sets. The global AUC across all whole-blood discovery data sets is0.93. Top: Score distribution by data set (blue, bacterial; red, viral). Middle: Individual gene expression (exp.). Bottom: Housekeeping genes (grayscale). The

dotted line at the top shows a possible global threshold for discriminating infection type.

www.ScienceTranslationalMedicine.org 6 July 2016 Vol 8 Issue 346 346ra91 5

Page 6: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

by guest on July 17, 2020http://stm

.sciencemag.org/

Dow

nloaded from

respectively, compared to a recent meta-analysis of procalcitonin thatshowed a negative likelihood ratio of 0.29 (95% CI, 0.22 to 0.38) (55).We plotted negative predictive value (NPV) and positive predictive val-ue (PPV) versus prevalence for these test characteristics; the NPV andPPV for bacterial infection at a prevalence of 15% were 98.3 and 29.2%,respectively (fig. S17).

There was only one data set [GSE63990 (14)] that included non-infectious inflammation patients and patients with both bacterial andviral illness but did not include healthy controls, precluding its addi-tion to the global calculations. We thus tested the IADM on this cohortwith locally derived test thresholds. We found an overall bacterialinfection sensitivity and specificity of 94.3 and 52.2%, respectively(fig. S18).

Validation in independent samples usingNanoString nCounterFinally, we used targeted quantitative NanoString nCounter (56) geneexpression assays to prospectively validate these results in independentwhole-blood samples from children with sepsis from the Genomics ofPediatric SIRS and Septic Shock Investigators (GPSSSI) cohort (totaln = 96, with 36 SIRS, 49 bacterial sepsis, and 11 viral sepsis patients;Fig. 4 and table S4). The GPSSSI cohort was also used by data setGSE66099, but the children profiled here were never profiled via micro-array and are thus not part of the discovery data sets. In the NanoStringvalidation cohort, the SMS AUC was 0.81 (AUC of 0.80 in GSE66099).Similarly, the bacterial/viral metascore AUC was 0.84 (AUC of 0.83 inGSE66099). The microarray AUCs were thus preserved when tested witha targeted, quantitative gene expression assay in new patients. Applyingthe same IADM, the sensitivity and specificity were 89.7 and 70.0% for

www.ScienceTranslationalMedicine.org

bacterial infections (LR−, 0.15; LR+, 3.0)and 54.5 and 96.5% for viral infections(LR−, 0.47; LR+, 15.6), respectively.

DISCUSSION

Better diagnostics for acute infections areneeded in both the inpatient and outpatientsettings. In low-acuity outpatient settings,a simple diagnostic that can discriminatebacterial from viral infections may beenough to assist in appropriate antibioticusage. In higher-acuity settings, causes ofnoninfectious inflammation becomemoreimportant to rule out, and so a decisionmodel for antibiotic prescriptions mustinclude a noninfected, nonhealthy case.Thus, a reliable diagnostic needs to dis-tinguish all three cases (noninfected in-flammation, bacterial infection, and viralinfection). Here, using 426 samples from8 cohorts, we derived a parsimonious setof only seven genes that can accurately dis-criminate bacterial from viral infectionsacross a very broad range of clinical con-ditions in independent cohorts (total of30 cohorts composed of 1299 patients).We further demonstrated that by inte-

grating our published SMS (7) (to distinguish the presence or absenceof infection) with the bacterial/viral metascore (to determine infectiontype) into a single IADM, we can determine with high accuracy whichpatients do not require antibiotics. Finally, we confirmed the diagnosticpower of both the seven-gene set and the IADM in independentsamples using a targeted NanoString assay, showing that the signaturesretain diagnostic power when not relying on microarrays.

The IADM has a low negative likelihood ratio (0.10) and high es-timated NPV, meaning it would be potentially effective as a rule-outtest. A meta-analysis of procalcitonin that included 3244 patients from30 studies resulted in an overall estimated negative likelihood ratio of0.29 (95% CI, 0.22 to 0.38) (55). Thus, the IADM negative likelihoodratio is much lower than the estimate for procalcitonin (on the basis ofnonoverlapping 95% CIs), indicating clinical utility. Moreover, thesetest characteristics assume no knowledge of the patient and so are onlyestimates of the real-world clinical utility of such a test, because patienthistory, physical examination, vital signs, and laboratory values wouldall assist in a diagnosis as well. Even given these caveats, a recent eco-nomic decision model of screening ICU patients for hospital-acquiredinfections suggested that a test such as the IADM that can accuratelydiagnose bacterial and viral infections could be cost-effective (57).Ultimately, only interventional trials will be able to establish cost-effectiveness and clinical utility of a diagnostic test.

We validated our diagnostic in pediatric sepsis patients from theGPSSSI cohort using a NanoString assay. NanoString is highly accu-rate and is a useful tool for measuring the expression of multiple genesat once; however, it is also likely too slow for clinical application (4 to6 hours per assay). Thus, although the assay confirms that our gene setis robust in targeted measurements, further work will be needed to

C

A

B

0 2 4 6 8 10

−15

−10

−5

0

Sepsis MetaScore

Bac

teria

l/vira

l met

asco

re

Noninfected SIRSBacterial sepsisViral sepsis

Fig. 3. IADM across COCONUT-conormalized public gene expression data that matched inclusioncriteria. (A) IADM schematic. (B) Distribution of scores and cutoffs for IADM in COCONUT-conormalized

data. SIRS, systemic inflammatory response syndrome. (C) Confusionmatrix for diagnosis. Bacterial infectionsensitivity, 94.0%; bacterial infection specificity, 59.8%; viral infection sensitivity, 53.0%; viral infection spec-ificity, 90.6%.

6 July 2016 Vol 8 Issue 346 346ra91 6

Page 7: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

by guest on July 17, 2020http://stm

.sciencemag.org/

Dow

nloaded from

improve the turnaround time. There are multiple possibilities for de-veloping a commercial assay based on rapid multiplexed quantitativePCR that meets the time-sensitive demands of an infection diagnostictest. However, this technical hurdle is something that all gene expression–based infection diagnostics must overcome to gain clinical relevance.

A simple linear score generally cannot adequately separate the threeclasses (noninfected inflammation, bacterial infections, and viral infec-tions); other machine-learning techniques, such as multiple regressionor tree-based methods, are typically used. However, because the locationand scale of different genes vary greatly between microarray types, suchtechniques cannot usually be truly validated across microarray platforms.Although COCONUT partially circumvents this issue by allowing for dis-covery of a model across several cohorts with application of the samemodel in COCONUT-conormalized validation cohorts, such methodwould be forced to leave out data sets that did not include healthy con-trols. We thus opted to use our simple, scale-free, difference-of-geometric-means scores. This allowed us to both discover and validate our diag-nostic models across multiple cohorts. It may be possible to discover agene set using multiple classification techniques, instead of our currentmethod of starting with the SMS and then adding the bacterial/viralmetascore. However, to discover and then validate in multiple cohorts

www.S

requires that all cohorts be appropriate for COCONUT normalization,and we have instead erred on the side of maximizing the utility ofmultiple clinically heterogeneous cohorts.

Several groups have published models for diagnosing infections onthe basis of host gene expression; none have yet made it into clinicalpractice. Most of these classifiers were not tested in multiple independentcohorts, had too many genes to allow rapid profiling necessary for usefuldiagnosis, or both. For instance, Suarez et al. created a 10-gene k-nearest-neighbor classifier but did not test it outside their published data set(GSE60244) (13). Tsalik et al. created a 122-probe (120-gene) classifier onthe basis of multiple regression models, but in testing it in external GEOcohorts, they retrained their regression coefficients in each new data set(14). Such model retraining results in a strong upward bias to thesevalidation numbers (assuming that a final model would not be locallyretrained). Other groups have made gene expression classifiers for infectionbut did not include models for discriminating viral infections (7, 9, 10).Our IADM is robust across a wide range of disease types and severitiesbut has a relatively lower sensitivity for viral infections. Non–gene expressionbiomarkers have also been used for infection diagnosis. Procalcitonin hasbeen studied extensively in the setting of sepsis diagnosis but cannot dis-tinguish between noninfected individuals and those with viral infections

−6 −4 −2 0 2 4

−12

−10

−8

−6

−4

−2

Sepsis MetaScore

Bac

teria

l/vira

l met

asco

re

Noninfected SIRSBacterial sepsisViral sepsis

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00False-positive rate (1−specificity)

True

-pos

itive

rat

e (s

ensi

tivity

)

AUC=0.81 (95%CI 0.85–0.77)

SIRS versus sepsis, Nanostring data

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00False-positive rate (1−specificity)

True

Pos

itive

Rat

e (S

ensi

tivity

)

AUC=0.84 (95%CI 0.92−0.77)

Viral versus bacterial, Nanostring dataB C

D

E

Fig. 4. TargetedNanoString gene expression data for childrenwith SIRS/sepsis from theGPSSSI cohort never testedwithmicroarrays. Total n= 96,of which SIRS = 36, bacterial sepsis = 49, and viral sepsis = 11. (A) Breakdownof infected patients by organism type. (B and C) ROC curves for the SMS and the

bacterial/viral metascore. (D) Distribution of scores and cutoffs for IADM. (E) Confusion matrix for IADM. Bacterial infection sensitivity, 89.7%; bacterial infec-tion specificity, 70.0%; viral infection sensitivity, 54.5%; viral infection specificity, 96.5%.

cienceTranslationalMedicine.org 6 July 2016 Vol 8 Issue 346 346ra91 7

Page 8: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

by guest on July 17, 2020http://stm

.sciencemag.org/

Dow

nloaded from

(58). Protein-panel assays have been shown to discriminate bacterial fromviral infections but cannot discriminate patients with noninfectious inflam-mation (59, 60). Thus, all of these classifiers have certain strengths andweaknesses that will become more apparent with further prospectivetesting and direct comparison.

Although our goal in this study was to identify biomarkers and notnecessarily mechanistic biology, it is still important for a biomarker set tohave biological plausibility. Of the seven genes in the bacterial/viral meta-score, six have previously been linked to infections or leukocyte activation.Both IFI27 and JUPwere shown in single-cohort genome-wide expressionstudies to be induced in response to viral infection (52, 61), whereas TNIP1and CTSB are important in modulating the nuclear factor kB and necroticresponses to bacterial infection (62, 63). Finally, LAX1 (up-regulated inviral infections) is involved in activation of T and B cells (64), and HK3is instrumental in the neutrophil differentiation pathway (65). Thus, therole of these transcripts as biomarkers for infection type is not coincidental.

Here, we developed a method, COCONUT, to directly compareour model across a large pool of one-class cohorts that would otherwisebe unusable for benchmarking a diagnostic gene set. COCONUT assumesthat all controls come from the same distribution; that is, the genes in eachgroup of controls are reset to have the same mean and variance, withbatch parameters learned empirically from gene groups. This method cor-rects for microarray and batch processing differences between cohortsand thus allows for the creation of a global ROC curve with a singlethreshold. This is a more “real-world”measure of diagnostic power thanreporting multiple validation ROC curves, because no single cutoff couldattain the same test characteristics in the different cohorts (17). How-ever, the COCONUT-conormalized data showed differences in infantsas compared to older children and adults. Therefore, infants were ex-cluded from the validation, meaning that the efficacy of the IADM ininfants remains unknown. These data also had few children with viral in-fections between the toddler and teenage years; this is an age group that willrequire further study. Themost important takeaway from the COCONUT-conormalized data is that both the bacterial/viral metascore and the IADMretain diagnostic power across a very broad range of infection types andseverities, with overall AUCs that are similar to the summary AUCsfrom head-to-head comparisons within cohorts.

Overall, we have leveraged our proven multicohort analysis pipe-line to derive a highly robust model for improving infection diagnosis.Using our method, we were able to validate this in dozens of independentmicroarray cohorts. We have also validated using a targeted NanoStringassay in pediatric sepsis patients. Although the IADM still needs toundergo optimization for rapid turnaround as well as a prospectiveinterventional trial, it seems clear that molecular profiling of the hostgenome will become part of the clinical toolkit in the future.

MATERIALS AND METHODS

Study designThe purpose of this study was to use an integrated multicohort analysisframework to analyze multiple gene expression data sets to identify abiomarker that can classify patients with bacterial or viral infections.This framework has been described previously (7, 16, 17).

Systematic search and multicohort analysisWe performed a systematic search in NIH GEO and EBI ArrayExpressfor public humanmicroarray genome-wide expression studies using thefollowing search terms: bact[wildcard], vir[wildcard], infection, sepsis,

www.S

SIRS, ICU, nosocomial, fever, and pneumonia. Abstracts were screenedto remove all studies that were (i) nonclinical, (ii) performed using tis-sues other than whole blood or PBMCs, or (iii) comparing patients whowere not matched for clinical time.

In all data sets that included sample-level microbiology data, weretained only those samples for which pathogens of a single type (eitherbacterial or viral) had been identified. Data sets for which sample-levelmicrobiology data were not available were still retained if the corres-ponding paper described the cohort as being infected with only bacteriaor only viruses. In three cohorts, the diagnosis was not necessarily micro-biologically confirmed: GSE11755 (one of six patients described asculture-negative meningitis), GSE42834 (described as a cohort of bacterialpneumonia meeting clinical criteria), and GSE57065 (described as a co-hort of bacterial sepsis for which 86% of patients had microbiologicalconfirmation). All other samples used in our analysis had confirmedmicrobiological diagnoses at the sample or cohort level.

All microarray data were renormalized from raw data (when avail-able) using standardized methods. Affymetrix arrays were renormalizedusing GC robust multiarray average (gcRMA) (on arrays with mismatchprobes) or RMA. Illumina, Agilent, GE, and other commercial arrayswere renormalized via normal-exponential background correction fol-lowed by quantile normalization. Custom arrays were not renormalized.Data were log2-transformed, and a fixed-effect model was used to sum-marize probes to genes within each study. Within each study, cohortsassayed with different microarray types were treated as independent.

We performed multicohort meta-analysis as described previously(7, 15–17). Briefly, genes were summarized using Hedges’ g, and theDerSimonian-Laird random-effects model was used for meta-analysis,followed by Benjamini-Hochberg multiple hypothesis correction (66).Patients with bacterial infections were compared to patients with viralinfections within studies, such that a positive effect size indicates that agene was more highly expressed in virus-infected patients and a negativeeffect size indicates that a gene was more highly expressed in bacteria-infected patients. All data sets that matched criteria with n > 5 in bothbacterial and viral cohorts that were published by the time of the initialsearch (1 April 2015) were used for discovery; data sets published afterthis time were used in validation.

To find a set of genes highly conserved in differential expression be-tween bacterial and viral infections, we selected all cohorts that directlycompared patients with bacterial and viral infections. Patients with do-cumented co-infections (both bacterial and viral) were removed. Co-horts were required to have >5 patients in each group to be included inthe meta-analysis. Both PBMC and whole-blood cohorts were included.We only included genes present in 7 (k − 1) data sets; thus, there were14,729 genes included in the analysis. Significant genes were those thathad an effect size >2-fold and an FDR <1% in a leave–one–data set–outround-robin analysis. However, to ensure that genes from both wholeblood and PBMCs were represented in the final gene set, we also per-formed separate meta-analyses of the PBMC and whole-blood cohortsand removed all genes that had an effect size <1.5-fold in either whole bloodor PBMCs separately. The remaining genes were considered significant.

Derivation of the seven-gene setTo find a set of highly diagnostic genes, the significant genes from themeta-analysis were run through a greedy forward search as describedpreviously (7). Briefly, this algorithm starts with zero gene and adds onegene in each cycle that best improves the AUC for diagnosis in the dis-covery cohorts, until a new gene cannot improve the discovery AUCs

cienceTranslationalMedicine.org 6 July 2016 Vol 8 Issue 346 346ra91 8

Page 9: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

by guest on July 17, 2020http://stm

.sciencemag.org/

Dow

nloaded from

more than some threshold (an increase in the weighted discovery AUCof ≥1). The resulting genes are used to calculate a single bacterial/viralmetascore, calculated as the geometric mean of the “viral” response genesminus the geometric mean of the “bacterial” response genes, times theratio of the number of genes in each set. The resulting continuous scorecan then be tested for diagnostic power using ROC curves.

Direct validation of the seven-gene setThe resulting gene set was first validated in the remaining public geneexpression cohorts that directly compared bacterial to viral infectionsbut were too small to be used for meta-analysis. Two cohorts[GSE60244 (13) and GSE63990 (14)] were made public after ourmeta-analysis was completed and so were used for validation. To showgeneralizability, we also examined one large in vitro data set comparingLPS to influenza exposure in monocyte-derived dendritic cells, but thiswas not included in the summary AUC because it is not expected tocome from the same distribution as the clinical studies.

Summary ROC curvesFor both discovery and validation cohorts, summary ROC curves wereconstructed according to the method of Kester and Buntinx (67) and aspreviously described (17). Briefly, linear-exponential models weremadefor each ROC curve, and the parameters of these individual curves weresummarized using a random-effects model to estimate the overall sum-mary ROC curve parameters. The a-parameter controls AUC (in par-ticular, distance of the line from the line of identity), and the b-parametercontrols skewness of the ROC curve. Summary AUCCIs were estimatedfrom the SE of the a and b in meta-analysis.

COCONUT conormalizationThere are dozens of public microarray cohorts that profiled patientswith either bacterial or viral infections, but not both. It would be advan-tageous to be able to compare a gene score across these cohorts, but ithas not previously been possible because each different microarray haswidely different background measurements for each gene, and amongstudies using the same types ofmicroarrays, there are large batch effects.To make use of these data, we needed to conormalize these cohorts insuch a way that (i) no bias is introduced that could influence final clas-sification (the normalization protocol should be blind to diagnosis), (ii)there should be no change to the distribution of a gene within a study,and (iii) a gene should show the samedistributions between studies afternormalization. A method with these characteristics would allow ourgene score to be calculated and compared across multiple studies andthus allow us to broadly test its generalizability.

The ComBat empirical Bayes normalization method (31) is popu-lar for cross-platform normalization but crucially falls short of ourdesired criteria because it assumes an equal distribution across diseasestates. We thus developed a modified version of the ComBat method,which conormalizes control samples from different cohorts to allowfor direct comparison of diseased samples from those same cohorts.We call this method COmbat CO-Normalization Using conTrols orCOCONUT. COCONUT makes one strong assumption that it forcescontrol/healthy patients from different cohorts to represent the samedistribution. Briefly, all cohorts are split into healthy and diseasedcomponents. The healthy components undergo ComBat conormaliza-tion without covariates. The ComBat estimated parameters a

ˇ

, b

ˇ

, s

ˇ

, d*,and g* are obtained for each data set for the healthy component and thenapplied to the diseased component (fig. S7). This forces the diseased com-

www.S

ponents of all cohorts to be from the same background distribution butretains their relative distance from the healthy component (t statisticswithin data sets are only different post-COCONUT because of floating-point math). It also does not require any a priori knowledge of diseaseclassification (such as bacterial or viral infection), thus meeting ourprespecified criteria. This method does have the notable requirementthat healthy/control patients are required to be present in a data set for itto be pooled with other available data. Also, because healthy/control pa-tients are set to be in the same distribution, it should only be used wheresuch an assumption is reasonable (such as within the same tissue type,among the same species, etc.).

The ComBat model and the COCONUT methodAs described by Johnson et al., the ComBat model corrects for loca-tion and scale of each gene by first solving an ordinary least-squaresmodel for gene expression and then shrinking the resulting param-eters using an empirical Bayes estimator, solved iteratively (31). For-mally, each gene expression level Yijg (for gene g for sample j in batch i)is assumed to be composed of overall gene expression ag, design matrixof sample conditions X with regression coefficients bg, additive andmultiplicative batch effects gig and dig, and an error term eijg

Yijg ¼ ag þ Xbg þ gig þ digeijg

Estimating parameters using ordinary least-squares regressionstandardizes Yijg to a new term Zijg (where s

ˇ

g is the SD of eijg)

Zijg ¼Yijg � a

ˇ

g � Xb

ˇ

g

s

ˇ

g

The standardized data are now distributed according toZijg ∼ nðgig; d2igÞ, where gig ∼ nðYi; t2i Þ and d2ig ∼ inverse g ðli; qiÞ.

The inverse g is assumed as a standard uninformative Bayesianprior. The remaining hyperparameters are estimated empirically, withthe derivation and solution found in the original reference (31). Theestimated batch effects g�ig and d

2�ig can then be used to adjust the stan-

dardized data to an empirical Bayes batch-adjusted final output Y�ijg

Y�ijg ¼

s

ˇ

g

d�igZijg � g�ig

� �þ a

ˇ

g þ Xb

ˇ

g

In our modified version of this method (COCONUT), all of theabove are performed according to the original method without modifi-cation. However, it is applied to only the healthy/control patients ineach data set (Y is a matrix of only healthy patient samples). The esti-mated parametersa

ˇ

,b

ˇ

,s

ˇ

, d*, and g* are all taken and applied directly toamatrixD that consists only of diseased patient sample (whichmust beordered in the same manner as Y)

Eikg ¼Dikg � a

ˇ

g � Xb

ˇ

g

s

ˇ

g

D�ikg ¼

s

ˇ

g

d�igEikg � g�ig

� �þ a

ˇ

g þ Xb

ˇ

g

We can thus obtain a batch-corrected version of diseased samplesD*, which corrects for the differences between healthy controls but doesnot change each submatrix Di with respect to each Yi.

cienceTranslationalMedicine.org 6 July 2016 Vol 8 Issue 346 346ra91 9

Page 10: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

by guest on July 17, 2020http://stm

.sciencemag.org/

Dow

nloaded from

Global ROCsWe used COCONUT conormalization to test (i) all discovery cohortsand (ii) all validation cohorts, even those containing only bacterial oronly viral illness. We did this separately for the PBMC and whole-blooddata, for reasons described previously. After conormalization, the distri-butions for the individual cohorts were plotted together to allow for di-rect comparison. For each plot, we show (i) the distribution of scores foreach data set, (ii) the normalized gene expression for each gene withinthe diagnostic test, and (iii) the housekeeping genes that are expected toshow no difference between classes based on meta-analysis. The healthypatients have been removed from these plots. However, to show that thedistributions of genes between healthy and diseased patients withincohorts do not change after COCONUT conormalization, we have alsoshown plots with both patient types with both target genes andhousekeeping genes (fig. S8). Genes with minimal effect size and min-imal variance in meta-analysis were selected as housekeeping genes.

For each comparison, a single global ROCAUCwas calculated, anda single threshold was set to allow for an estimate of the real-world di-agnostic performance of the tests. Thresholds for the cutoffs for bacterialversus viral infection were set to approximate a sensitivity of 90% forbacterial infection, because a bacterial infection false-negative (the recom-mendation not to give antibiotics when antibiotics are needed) can bedevastating.

Integrated antibiotics decision modelThe SMS can discriminate between patients with severe acute infectionsand those with inflammation from other sources, but it cannot distin-guish between types of infection (fig. S1). We thus tested an IADM, inwhich the 11-gene SMS is applied, followed by the 7-gene bacterial/viralmetascore. This model thus identifies (i) whether a patient has an infec-tion and, if so, (ii) what type of infection is present (bacterial or viral).We were unable to identify enough validation cohorts with patientswith noninfected inflammation that also included healthy controls;thus, in constructing the global ROCs, we used both discovery and val-idation cohorts. Using COCONUT conormalization, we set globalthresholds across all included cohorts, and these were applied to eachindividual data set to test the ability of the IADMto correctly distinguishpatients with noninfectious inflammation, bacterial infection, and viralinfection. Healthy patients were not included as a diagnostic class be-cause they were used in the conormalization procedure. The IADMwasalso applied separately to all cohorts that had no healthy controls butthat included (i) noninfected SIRS patients and (ii) patients with bothbacterial and viral infections.

Because the PPV and NPV are dependent on prevalence and theprevalence of infections in the data used here does not match the prev-alenceof infections inahospital setting,wecalculatedPPVandNPVcurveson the basis of the sensitivity and specificity for bacterial infections attainedwith the IADM. Formally, NPV = specificity × (1 − prevalence)/((1 −sensitivity)×prevalence+specificity× (1−prevalence)); PPV=sensitivity ×prevalence/(sensitivity × prevalence + (1− specificity) × (1− prevalence)).

NanoString validationWe tested 96 samples from independent patients (those never profiledviamicroarray) fromtheGPSSSI trials (18–22) using a targetedNanoString(56) digital multiplex gene quantitation assay. The 18 genes were re-normalized to housekeeping genes (FRAS1 and LRRC17). The SMS andbacterial/viral metascore genes were both assayed, and the diagnosticperformance of the IADM was calculated.

www.Sc

Data and source code availabilityAll analyses were conducted in the R statistical computing language(version 3.1.1). Code to recreate the multicohort meta-analysis, theCOCONUT R package source code, and the COCONUT-normalizeddata used here have been deposited and are available at http://khatrilab.stanford.edu/sepsis.

SUPPLEMENTARY MATERIALS

www.sciencetranslationalmedicine.org/cgi/content/full/8/346/346ra91/DC1Fig. S1. The SMS and pathogen type.Fig. S2. Study schematic.Fig. S3. Forest plots of the seven-gene set.Fig. S4. Summary ROC forest plots for discovery data.Fig. S5. Summary ROC forest plots for direct validation data.Fig. S6. Bacterial/viral metascore ROC in GSE53166.Fig. S7. Schematic of COCONUT conormalization.Fig. S8. COCONUT conormalization of whole-blood discovery data sets.Fig. S9. Bacterial/viral score in global ROC of non-conormalized whole-blood discovery data sets.Fig. S10. Bacterial/viral score in global ROC of COCONUT-conormalized whole-blood discoverydata sets.Fig. S11. Bacterial/viral score in global ROC of non-conormalized whole-blood validation data sets.Fig. S12. Bacterial/viral score in global ROC of non-conormalized PBMC validation data sets.Fig. S13. Bacterial/viral score in global ROC of COCONUT-conormalized PBMC validation data sets.Fig. S14. The effects of age on SMS in COCONUT-conormalized data.Fig. S15. SMS across all COCONUT-conormalized whole-blood data (both discovery and validation).Fig. S16. IADM across COCONUT-conormalized public gene expression data including healthy controls.Fig. S17. NPV and PPV for the IADM.Fig. S18. GSE63990, adults with ARIs.Table S1. Significant gene list.Table S2. Test characteristics of the bacterial/viral metascore in direct validation data sets.Table S3. Data sets with noninfected inflammatory conditions used to test the IADM.Table S4. NanoString data.Acknowledgments

REFERENCES AND NOTES

1. R. Ferrer, I. Martin-Loeches, G. Phillips, T. M. Osborn, S. Townsend, R. P. Dellinger, A. Artigas,C. Schorr, M. M. Levy, Empiric antibiotic treatment reduces mortality in severe sepsis andseptic shock from the first hour: Results from a guideline-based performance improve-ment program. Crit. Care Med. 42, 1749–1755 (2014).

2. S. Fridkin, J. Baggs, R. Fagan, S. Magill, L. A. Pollack, P. Malpiedi, R. Slayton, K. Khader,M. A. Rubin, M. Jones, M. H. Samore, G. Dumyati, E. Dodds-Ashley, J. Meek, K. Yousey-Hindes,J. Jernigan, N. Shehab, R. Herrera, C. L. McDonald, A. Schneider, A. Srinivasan; Centers forDisease Control and Prevention (CDC), Vital signs: Improving antibiotic use among hospita-lized patients. Morb. Mortal. Wkly. Rep. 63, 194–200 (2014).

3. C. G. Grijalva, J. P. Nuorti, M. R. Griffin, Antibiotic prescription rates for acute respiratorytract infections in US ambulatory settings. JAMA 302, 758–766 (2009).

4. J. Andrews, in 9th International Conference on Typhoid and Invasive NTS Disease (Bali, Indonesia,2015).

5. National Strategy and Action Plan for Combating Antibiotic Resistant Bacteria (Nova Publishers,New York, 2015).

6. O. Liesenfeld, L. Lehman, K.-P. Hunfeld, G. Kost, Molecular diagnosis of sepsis: New aspectsand recent developments. Eur. J. Microbiol. Immunol. 4, 1–25 (2014).

7. T. E. Sweeney, A. Shidham, H. R. Wong, P. Khatri, A comprehensive time-course–basedmulticohort analysis of sepsis and sterile inflammation reveals a robust diagnostic geneset. Sci. Transl. Med. 7, 287ra271 (2015).

8. T. E. Sweeney, P. Khatri, Comprehensive validation of the FAIM3:PLAC8 ratio in time-matched public gene expression aata. Am. J. Respir. Crit. Care Med. 192, 1260–1261 (2015).

9. L. McHugh, T. A. Seldon, R. A. Brandon, J. T. Kirk, A. Rapisarda, A. J. Sutherland, J. J. Presneill,D. J. Venter, J. Lipman, M. R. Thomas, P. M. C. Klein Klouwenberg, L. van Vught, B. Scicluna,M. Bonten, O. L. Cremer, M. J. Schultz, T. van der Poll, T. D. Yager, R. B. Brandon, A molecularhost response assay to discriminate between sepsis and infection-negative systemic inflam-mation in critically ill patients: Discovery and validation in independent cohorts. PLOS Med. 12,e1001916 (2015).

ienceTranslationalMedicine.org 6 July 2016 Vol 8 Issue 346 346ra91 10

Page 11: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

by guest on July 17, 2020http://stm

.sciencemag.org/

Dow

nloaded from

10. B. P. Scicluna, P. M. C. Klein Klouwenberg, L. A. van Vught, M. A. Wiewel, D. S. Y. Ong,A. H. Zwinderman, M. Franitza, M. R. Toliat, P. Nürnberg, A. J. Hoogendijk, J. Horn, O. L. Cremer,M. J. Schultz, M. J. Bonten, T. van der Poll, A molecular biomarker to diagnose community-acquired pneumonia on intensive care unit admission. Am. J. Respir. Crit. Care Med. 192, 826–835(2015).

11. X. Hu, J. Yu, S. D. Crosby, G. A. Storch, Gene expression profiles in febrile children withdefined viral and bacterial infection. Proc. Natl. Acad. Sci. U.S.A. 110, 12792–12797 (2013).

12. A. K. Zaas, T. Burke, M. Chen, M. McClain, B. Nicholson, T. Veldman, E. L. Tsalik, V. Fowler,E. P. Rivers, R. Otero, S. F. Kingsmore, D. Voora, J. Lucas, A. O. Hero, L. Carin, C. W. Woods,G. S. Ginsburg, A host-based RT-PCR gene expression signature to identify acute respiratoryviral infection. Sci. Transl. Med. 5, 203ra126 (2013).

13. N. M. Suarez, E. Bunsow, A. R. Falsey, E. E. Walsh, A. Mejias, O. Ramilo, Superiority of transcrip-tional profiling over procalcitonin for distinguishing bacterial from viral lower respiratory tractinfections in hospitalized adults. J. Infect. Dis. 212, 213–222 (2015).

14. E. L. Tsalik, R. Henao, M. Nichols, T. Burke, E. R. Ko, M. T. McClain, L. L. Hudson, A. Mazur,D. H. Freeman, T. Veldman, R. J. Langley, E. B. Quackenbush, S. W. Glickman, C. B. Cairns,A. K. Jaehne, E. P. Rivers, R. M. Otero, A. K. Zaas, S. F. Kingsmore, J. Lucas, V. G. Fowler Jr.,L. Carin, G. S. Ginsburg, C. W. Woods, Host gene expression classifiers diagnose acuterespiratory illness etiology. Sci. Transl. Med. 8, 322ra311 (2016).

15. M. Andres-Terre, H. M. McGuire, Y. Pouliot, E. Bongen, T. E. Sweeney, C. M. Tato, P. Khatri,Integrated, multi-cohort analysis identifies conserved transcriptional signatures acrossmultiple respiratory viruses. Immunity 43, 1199–1211 (2015).

16. P. Khatri, S. Roedder, N. Kimura, K. De Vusser, A. A. Morgan, Y. Gong, M. P. Fischbein, R. C. Robbins,M. Naesens, A. J. Butte, M. M. Sarwal, A common rejection module (CRM) for acute rejectionacross multiple organs identifies novel therapeutics for organ transplantation. J. Exp. Med. 210,2205–2221 (2013).

17. T. E. Sweeney, L. Braviak, C. M. Tato, P. Khatri, Genome-wide expression for diagnosis ofpulmonary tuberculosis: A multicohort analysis. Lancet Respir. Med. 4, 213–224 (2016).

18. T. P. Shanley, N. Cvijanovich, R. Lin, G. L. Allen, N. J. Thomas, A. Doctor, M. Kalyanaraman,N. M. Tofil, S. Penfil, M. Monaco, K. Odoms, M. Barnes, B. Sakthivel, B. J. Aronow, H. R. Wong,Genome-level longitudinal expression of signaling pathways and gene networks in pediatricseptic shock. Mol. Med. 13, 495–508 (2007).

19. H. R. Wong, T. P. Shanley, B. Sakthivel, N. Cvijanovich, R. Lin, G. L. Allen, N. J. Thomas, A. Doctor,M. Kalyanaraman, N. M. Tofil, S. Penfil, M. Monaco, M. A. Tagavilla, K. Odoms, K. Dunsmore,M. Barnes, B. J. Aronow; Genomics of Pediatric SIRS.Septic Shock Investigators, Genome-levelexpression profiles in pediatric septic shock indicate a role for altered zinc homeostasis in pooroutcome. Physiol. Genomics 30, 146–155 (2007).

20. N. Cvijanovich, T. P. Shanley, R. Lin, G. L. Allen, N. J. Thomas, P. Checchia, N. Anas, R. J. Freishtat,M. Monaco, K. Odoms, B. Sakthivel, H. R. Wong; Genomics Pediatric SIRS/Septic Shock Inves-tigators, Validating the genomic signature of pediatric septic shock. Physiol. Genomics 34, 127–134(2008).

21. H. R. Wong, N. Cvijanovich, G. L. Allen, R. Lin, N. Anas, K. Meyer, R. J. Freishtat, M. Monaco,K. Odoms, B. Sakthivel, T. P. Shanley Genomics of Pediatic SIRS/Septc Shock Investigators,Genomic expression profiling across the pediatric systemic inflammatory response syn-drome, sepsis, and septic shock spectrum. Crit. Care Med. 37, 1558–1566 (2009).

22. H. R. Wong, N. Z. Cvijanovich, M. Hall, G. L. Allen, N. J. Thomas, R. J. Freishtat, N. Anas, K. Meyer,P. A. Checchia, R. Lin, M. T. Bigham, A. Sen, J. Nowak, M. Quasney, J. W. Henricksen, A. Chopra,S. Banschbach, E. Beckman, K. Harmon, P. Lahni, T. P. Shanley, Interleukin-27 is a novelcandidate diagnostic biomarker for bacterial infection in critically ill children. Crit. Care16, R213 (2012).

23. O. Ramilo, W. Allman, W. Chung, A. Mejias, M. Ardura, C. Glaser, K. M. Wittkowski, B. Piqueras,J. Banchereau, A. K. Palucka, D. Chaussabel, Gene expression patterns in blood leukocytesdiscriminate patients with acute infections. Blood 109, 2066–2077 (2007).

24. G. Parnell, A. McLean, D. Booth, S. Huang, M. Nalos, B. Tang, Aberrant cell cycle and apo-ptotic changes characterise severe influenza A infection – a meta-analysis of genomicsignatures in circulating leukocytes. PLOS One 6, e17186 (2011).

25. G. P. Parnell, A. S. McLean, D. R. Booth, N. J. Armstrong, M. Nalos, S. J. Huang, J. Manak,W. Tang, O.-Y. Tam, S. Chan, B. M. Tang, A distinct influenza infection signature in theblood transcriptome of patients with severe community-acquired pneumonia. Crit.Care 16, R157 (2012).

26. J. A. Herberg, M. Kaforou, S. Gormley, E. R. Sumner, S. Patel, K. D. J. Jones, S. Paulus, C. Fink,F. Martinon-Torres, G. Montana, V. J. Wright, M. Levin, Transcriptomic profiling in childhoodH1N1/09 influenza reveals reduced expression of protein synthesis genes. J. Infect. Dis. 208,1664–1668 (2013).

27. S. J. Popper, V. E. Watson, C. Shimizu, J. T. Kanegaye, J. C. Burns, D. A. Relman, Genetranscript abundance profiles distinguish Kawasaki disease from adenovirus infection.J. Infect. Dis. 200, 657–666 (2009).

28. C. L. Smith, P. Dickinson, T. Forster, M. Craigon, A. Ross, M. R. Khondoker, R. France, A. Ivens,D. J. Lynn, J. Orme, A. Jackson, P. Lacaze, K. L. Flanagan, B. J. Stenson, P. Ghazal, Identifi-cation of a human neonatal immune-metabolic network associated with bacterial infection.Nat. Commun. 5, 4649 (2014).

www.Sc

29. R. Almansa, L. Socias, M. Sanchez-Garcia, I. Martín-Loeches, M. del Olmo, D. Andaluz-Ojeda,F. Bobillo, L. Rico, A. Herrero, V. Roig, C. A. San-Jose, S. Rosich, J. Barbado, C. Disdier, R. O. deLejarazu, M. C. Gallegos, V. Fernandez, J. F. Bermejo-Martin, Critical COPD respiratory illnessis linked to increased transcriptomic activity of neutrophil proteases genes. BMC Res. Notes5, 401 (2012).

30. M. N. Lee, C. Ye, A.-C. Villani, T. Raj, W. Li, T. M. Eisenhaure, S. H. Imboywa, P. I. Chipendo,F. A. Ran, K. Slowikowski, L. D. Ward, K. Raddassi, C. McCabe, M. H. Lee, I. Y. Frohlich,D. A. Hafler, M. Kellis, S. Raychaudhuri, F. Zhang, B. E. Stranger, C. O. Benoist, P. L. De Jager,A. Regev, N. Hacohen, Common genetic variants modulate pathogen-sensing responses inhuman dendritic cells. Science 343, 1246980 (2014).

31. W. E. Johnson, C. Li, A. Rabinovic, Adjusting batch effects in microarray expression datausing empirical Bayes methods. Biostatistics 8, 118–127 (2007).

32. Y. Zhai, L. M. Franco, R. L. Atmar, J. M. Quarles, N. Arden, K. L. Bucasas, J. M. Wells, D. Niño,X. Wang, G. E. Zapata, C. A. Shaw, J. W. Belmont, R. B. Couch, Host transcriptional responseto influenza and other acute respiratory viral infections – a prospective cohort study. PLOSPathog. 11, e1004869 (2015).

33. M. Kwissa, H. I. Nakaya, N. Onlamoon, J. Wrammert, F. Villinger, G. C. Perng, S. Yoksan,K. Pattanapanyasat, K. Chokephaibulkit, R. Ahmed, B. Pulendran, Dengue virus infectioninduces expansion of a CD14+CD16+ monocyte population that stimulates plasmablastdifferentiation. Cell Host Microbe 16, 115–127 (2014).

34. A. Mejias, B. Dimo, N. M. Suarez, C. Garcia, M. C. Suarez-Arrabal, T. Jartti, D. Blankenship,A. Jordan-Villegas, M. I. Ardura, Z. Xu, J. Banchereau, D. Chaussabel, O. Ramilo, Wholeblood gene expression profiles to assess pathogenesis and disease severity in infantswith respiratory syncytial virus infection. PLOS Med. 10, e1001549 (2013).

35. J.-E. Berdal, T. E. Mollnes, T. Wæhre, O. K. Olstad, B. Halvorsen, T. Ueland, J. H. Laake,M. T. Furuseth, A. Maagaard, H. Kjekshus, P. Aukrust, C. M. Jonassen, Excessive innate immuneresponse and mutant D222G/N in severe A (H1N1) pandemic influenza. J. Infect. 63, 308–316(2011).

36. J. F. Bermejo-Martin, I. Martin-Loeches, J. Rello, A. Antón, R. Almansa, L. Xu, G. Lopez-Campos,T. Pumarola, L. Ran, P. Ramirez, D. Banner, D. C. Ng, L. Socias, A. Loza, D. Andaluz, E. Maravi,M. J. Gómez-Sánchez, M. Gordón, M. C. Gallegos, V. Fernandez, S. Aldunate, C. León, P. Merino,J. Blanco, F. Martin-Sanchez, L. Rico, D. Varillas, V. Iglesias, M. Á. Marcos, F. Gandía, F. Bobillo,B. Nogueira, S. Rojo, S. Resino, C. Castro, R. Ortiz de Lejarazu, D. Kelvin, Host adaptive im-munity deficiency in severe pandemic influenza. Crit. Care 14, R167 (2010).

37. A. K. Zaas, M. Chen, J. Varkey, T. Veldman, A. O. Hero III, J. Lucas, Y. Huang, R. Turner, A. Gilbert,R. Lambkin-Williams, N. C. Øien, B. Nicholson, S. Kingsmore, L. Carin, C. W. Woods, G. S. Ginsburg,Gene expression signatures diagnose influenza and other symptomatic respiratory viralinfections in humans. Cell Host Microbe 6, 207–217 (2009).

38. C. A. M. van de Weg, H.-J. van den Ham, M. A. Bijl, F. Anfasa, F. Zaaraoui-Boutahar, B. E. Dewi,L. Nainggolan, W. F. J. van IJcken, A. D. M. E. Osterhaus, B. E. Martina, E. C. M. van Gorp,A. C. Andeweg, Time since onset of disease and individual clinical markers associate withtranscriptional changes in uncomplicated dengue. PLOS Negl. Trop. Dis. 9, e0003522 (2015).

39. L. Conejero, K. Potempa, C. M. Graham, N. Spink, S. Blankley, F. J. Salguero, R. Pankla-Sranujit,P. Khaenam, J. F. Banchereau, V. Pascual, D. Chaussabel, G. Lertmemongkolchai, A. O’Garra,G. J. Bancroft, The blood transcriptome of experimental melioidosis reflects disease severityand shows considerable similarity with the human disease. J. Immunol. 195, 3248–3261 (2015).

40. M.-A. Cazalis, A. Lepape, F. Venet, F. Frager, B. Mougin, H. Vallin, M. Paye, A. Pachot, G. Monneret,Early and dynamic changes in gene expression in septic shock patients: A genome-wideapproach. Intensive Care Med. Exp. 2, 20 (2014).

41. M. Lill, S. Kõks, U. Soomets, L. C. Schalkwyk, C. Fernandes, I. Lutsar, P. Taba, Peripheralblood RNA gene expression profiling in patients with bacterial meningitis. Front. Neurosci.7, 33 (2013).

42. S. H. Ahn, E. L. Tsalik, D. D. Cyr, Y. Zhang, J. C. van Velkinburgh, R. J. Langley, S. W. Glickman,C. B. Cairns, A. K. Zaas, E. P. Rivers, R. M. Otero, T. Veldman, S. F. Kingsmore, J. Lucas, C. W. Woods,G. S. Ginsburg, V. G. Fowler Jr., Gene expression-based classifiers identify Staphylococcus aureusinfection in mice and humans. PLOS One 8, e48979 (2013).

43. F. Thuny, J. Textoris, A. B. Amara, A. E. Filali, C. Capo, G. Habib, D. Raoult, J.-L. Mege, Thegene expression analysis of blood reveals S100A11 and AQP9 as potential biomarkers ofinfective endocarditis. PLOS One 7, e31490 (2012).

44. A. Sutherland, M. Thomas, R. A. Brandon, R. B. Brandon, J. Lipman, B. Tang, A. McLean,R. Pascoe, G. Price, T. Nguyen, G. Stone, D. Venter, Development and validation of a novelmolecular biomarker diagnostic test for the early detection of sepsis. Crit. Care 15, R149 (2011).

45. M. P. R. Berry, C. M. Graham, F. W. McNab, Z. Xu, S. A. A. Bloch, T. Oni, K. A. Wilkinson,R. Banchereau, J. Skinner, R. J. Wilkinson, C. Quinn, D. Blankenship, R. Dhawan, J. J. Cush,A. Mejias, O. Ramilo, O. M. Kon, V. Pascual, J. Banchereau, D. Chaussabel, A. O’Garra, Aninterferon-inducible neutrophil-driven blood transcriptional signature in human tuber-culosis. Nature 466, 973–977 (2010).

46. R. Pankla, S. Buddhisa, M. Berry, D. M. Blankenship, G. J. Bancroft, J. Banchereau,G. Lertmemongkolchai, D. Chaussabel, Genomic transcriptional profiling identifies a candidateblood biomarker signature for the diagnosis of septicemic melioidosis. Genome Biol. 10, R127(2009).

ienceTranslationalMedicine.org 6 July 2016 Vol 8 Issue 346 346ra91 11

Page 12: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

R E S EARCH ART I C L E

by guest on July 17, 2http://stm

.sciencemag.org/

Dow

nloaded from

47. A. D. Irwin, F. Marriage, L. A. Mankhambo, G. Jeffers, R. Kolamunnage-Dona, M. Guiver, B. Denis,E. M. Molyneux, M. E. Molyneux, P. J. Day, E. D. Carrol, Novel biomarker combination improvesthe diagnosis of serious bacterial infections in Malawian children. BMC Med. Genomics 5, 13 (2012).

48. C. I. Bloom, C. M. Graham, M. P. R. Berry, F. Rozakeas, P. S. Redford, Y. Wang, Z. Xu,K. A. Wilkinson, R. J. Wilkinson, Y. Kendrick, G. Devouassoux, T. Ferry, M. Miyara, D. Bouvry,D. Valeyre, V. Dominique, G. Gorochov, D. Blankenship, M. Saadatian, P. Vanhems, H. Beynon,R. Vancheeswaran, M. Wickremasinghe, D. Chaussabel, J. Banchereau, V. Pascual, L.-p. Ho,M. Lipman, A. O’Garra, Transcriptional blood signatures distinguish pulmonary tuberculosis,pulmonary sarcoidosis, pneumonias and lung cancers. PLOS One 8, e70630 (2013).

49. M. Emonts, “Polymorphisms in Immune Response Genes in Infectious Diseases andAutoimmune Diseases,” thesis, Erasmus University Rotterdam, Rotterdam, Netherlands(2008).

50. M. I. Ardura, R. Banchereau, A. Mejias, T. Di Pucchio, C. Glaser, F. Allantaz, V. Pascual,J. Banchereau, D. Chaussabel, O. Ramilo, Enhanced monocyte response and decreased centralmemory T cells in children with invasive Staphylococcus aureus infections. PLOS One 4, e5446(2009).

51. K. Liu, L. Chen, R. Kaur, M. Pichichero, Transcriptome signature in young children withacute otitis media due to Streptococcus pneumoniae. Microbes Infect. 14, 600–609(2012).

52. I. Ioannidis, B. McNally, M. Willette, M. E. Peeples, D. Chaussabel, J. E. Durbin, O. Ramilo,A. Mejias, E. Flaño, Plasticity and virus specificity of the airway epithelial cell immuneresponse during respiratory virus infection. J. Virol. 86, 5422–5436 (2012).

53. S. J. Popper, A. Gordon, M. Liu, A. Balmaseda, E. Harris, D. A. Relman, Temporal dynamics ofthe transcriptional response to dengue virus infection in Nicaraguan children. PLOS Negl.Trop. Dis. 6, e1966 (2012).

54. H. K. Brand, I. M. Ahout, D. de Ridder, A. van Diepen, Y. Li, M. Zaalberg, A. Andeweg,N. Roeleveld, R. de Groot, A. Warris, P. W. M. Hermans, G. Ferwerda, F. J. Staal, Olfactomedin 4serves as a marker for disease severity in pediatric Respiratory Syncytial Virus (RSV) infection.PLOS One 10, e0131927 (2015).

55. C. Wacker, A. Prkno, F. M. Brunkhorst, P. Schlattmann, Procalcitonin as a diagnosticmarker for sepsis: A systematic review and meta-analysis. Lancet Infect. Dis. 13, 426–435(2013).

56. M. M. Kulkarni, Digital multiplexed gene expression analysis using the NanoString nCountersystem. Curr. Protoc. Mol. Biol. Chapter 25, Unit25B.10 (2011).

57. E. L. Tsalik, Y. Li, L. L. Hudson, V. H. Chu, T. Himmel, A. T. Limkakeng, J. N. Katz, S. W. Glickman,M. T. McClain, K. E. Welty-Wolf, V. G. Fowler, G. S. Ginsburg, C. W. Woods, S. D. Reed, Potentialcost-effectiveness of early identification of hospital-acquired infection in critically Ill patients.Ann. Am. Thorac. Soc. 13, 401–413 (2016).

58. D. N. Gilbert, Procalcitonin as a biomarker in respiratory tract infection. Clin. Infect. Dis.52 (suppl. 4), S346–S350 (2011).

59. K. Oved, A. Cohen, O. Boico, R. Navon, T. Friedman, L. Etshtein, O. Kriger, E. Bamberger,Y. Fonar, R. Yacobov, R. Wolchinsky, G. Denkberg, Y. Dotan, A. Hochberg, Y. Reiter, M. Grupper,I. Srugo, P. Feigin, M. Gorfine, I. Chistyakov, R. Dagan, A. Klein, I. Potasman, E. Eden, A novelhost-proteome signature for distinguishing between acute bacterial and viral infections. PLOSOne 10, e0120012 (2015).

60. C. Valim, R. Ahmad, M. Lanaspa, Y. Tan, S. Acácio, M. A. Gillette, K. D. Almendinger, D. A. Milner Jr.,L. Madrid, K. Pellé, J. Harezlak, J. Silterra, P. L. Alonso, S. A. Carr, J. P. Mesirov, D. F. Wirth,

www.Sc

R. C. Wiegand, Q. Bassat, Responses to bacteria, virus, and malaria distinguish the etiology ofpediatric clinical pneumonia. Am. J. Respir. Crit. Care Med. 193, 448–459 (2016).

61. T. Tolfvenstam, A. Lindblom, M. J. Schreiber, L. Ling, A. Chow, E. E. Ooi, M. L. Hibberd,Characterization of early host responses in adults with dengue disease. BMC Infect. Dis.11, 209 (2011).

62. T. Vanden Berghe, A. Linkermann, S. Jouan-Lanhouet, H. Walczak, P. Vandenabeele, Regulatednecrosis: The expanding network of non-apoptotic cell death pathways. Nat. Rev. Mol. Cell Biol.15, 135–147 (2014).

63. H. Ashida, M. Kim, M. Schmidt-Supprian, A. Ma, M. Ogawa, C. Sasakawa, A bacterial E3ubiquitin ligase IpaH9.8 targets NEMO/IKKgamma to dampen the host NF-kB-mediatedinflammatory response. Nat. Cell Biol. 12, 66–73 (2010).

64. M. Zhu, O. Granillo, R. Wen, K. Yang, X. Dai, D. Wang, W. Zhang, Negative regulation oflymphocyte activation by the adaptor protein LAX. J. Immunol. 174, 5612–5619 (2005).

65. E. A. Federzoni, P. J. M. Valk, B. E. Torbett, T. Haferlach, B. Löwenberg, M. F. Fey, M. P. Tschan,PU.1 is linking the glycolytic enzyme HK3 in neutrophil differentiation and survival of APL cells.Blood 119, 4963–4970 (2012).

66. Y. Benjamini, Y. Hochberg, Controlling the false discovery rate: A practical and powerfulapproach to multiple testing. J. R. Stat. Soc. Series B 57, 289–300 (1995).

67. A. D. M. Kester, F. Buntinx, Meta-analysis of ROC curves.Med. Decis. Making 20, 430–439 (2000).

Acknowledgments: We thank the patients who contributed clinical samples to the studiesherein and the researchers who gathered, analyzed, and shared their data (see SupplementaryAcknowledgments). We are sure that the push for open data will substantially improvetranslational medicine. Funding: T.E.S. was funded by a Stanford Child Health Research Insti-tute Young Investigator award (through the Institute for Immunity, Transplantation and Infec-tion) and the Society of University Surgeons. P.K. was funded by the Bill & Melinda GatesFoundation and National Institute of Allergy and Infectious Diseases grants 1U19AI109662,U19AI057229, U54I117925, and U01AI089859. Author contributions: T.E.S. and P.K. were incharge of the study conception and design; H.R.W. contributed to the materials; T.E.S. per-formed the experiments; T.E.S. and P.K. drafted the manuscript; and T.E.S., H.R.W., and P.K. criticallyrevised the manuscript. Competing interests: The seven-gene set and the IADM have been dis-closed for possible patent protection to the Stanford Office of Technology and Licensing by T.E.S.and P.K. T.E.S. also serves as a scientific advisor to Multerra Bio, which had no role in this manu-script. H.R.W. declares that he has no competing interests. Data and materials availability: Thepost–COCONUT-normalized data for this study have been deposited at khatrilab.stanford.edu/sepsis.Original MIAME (minimum information about a microarray experiment)–compliant microarray dataare all available under their respective accession numbers in National Center for BiotechnologyInformation GEO or EBI ArrayExpress. COCONUT is available as an R package on CRAN (Comprehen-sive R Archive Network).

Submitted 18 March 2016Accepted 13 June 2016Published 6 July 201610.1126/scitranslmed.aaf7165

Citation: T. E. Sweeney, H. R. Wong, P. Khatri, Robust classification of bacterial and viralinfections via integrated host gene expression diagnostics. Sci. Transl. Med. 8, 346ra91 (2016).

020

ienceTranslationalMedicine.org 6 July 2016 Vol 8 Issue 346 346ra91 12

Page 13: Robust classification of bacterial and viral infections ... · Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral

diagnosticsRobust classification of bacterial and viral infections via integrated host gene expression

Timothy E. Sweeney, Hector R. Wong and Purvesh Khatri

DOI: 10.1126/scitranslmed.aaf7165, 346ra91346ra91.8Sci Transl Med

classifies their infection as bacterial or viral, suggesting appropriate treatment.authors expanded their analysis to create an integrated score that not only identifies infected patients but alsoderive a blood test identifying which patients with sepsis-like symptoms have an underlying infection. Now, the

have previously analyzed large numbers of patients across many cohorts toet al.from antibiotics. Sweeney inflammation from those with bacterial and viral infections, and only those with bacterial sepsis derive any benefitantibiotics for bacterial infections. Unfortunately, it can be difficult to distinguish patients with noninfectious

especiallytherapy combines supportive treatment with interventions directed at the underlying cause of the illness, Sepsis, a severe inflammation caused by infection, is a common and deadly medical condition. Sepsis

Making a gene list, checking it twice

ARTICLE TOOLS http://stm.sciencemag.org/content/8/346/346ra91

MATERIALSSUPPLEMENTARY http://stm.sciencemag.org/content/suppl/2016/07/01/8.346.346ra91.DC1

CONTENTRELATED

http://stm.sciencemag.org/content/scitransmed/11/518/eaax9000.fullhttp://stm.sciencemag.org/content/scitransmed/7/299/299ra122.fullhttp://stm.sciencemag.org/content/scitransmed/7/287/287ra71.fullhttp://stm.sciencemag.org/content/scitransmed/5/195/195ra95.fullhttp://stm.sciencemag.org/content/scitransmed/8/322/322ra11.full

REFERENCES

http://stm.sciencemag.org/content/8/346/346ra91#BIBLThis article cites 64 articles, 9 of which you can access for free

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Terms of ServiceUse of this article is subject to the

registered trademark of AAAS. is aScience Translational MedicineScience, 1200 New York Avenue NW, Washington, DC 20005. The title

(ISSN 1946-6242) is published by the American Association for the Advancement ofScience Translational Medicine

Copyright © 2016, American Association for the Advancement of Science

by guest on July 17, 2020http://stm

.sciencemag.org/

Dow

nloaded from