11
BMC Medical Informatics and Decision Making Research article Syndromic surveillance: STL for modeling, visualizing, and monitoring disease counts Ryan P Hafen* 1 , David E Anderson 2 , William S Cleveland 1 , Ross Maciejewski 3 , David S Ebert 3 , Ahmad Abusalah 4 , Mohamed Yakout 4 , Mourad Ouzzani 4 and Shaun J Grannis 5,6 Address: 1 Department of Statistics, Purdue University, West Lafayette, Indiana, USA, 2 Department of Mathematics, Xavier University, New Orleans, Louisiana, USA, 3 Department of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana, USA, 4 Department of Computer Science, Purdue University, West Lafayette, Indiana, USA, 5 Regenstrief Institute, Indianapolis, Indiana, USA and 6 School of Medicine, Indiana University, Indianapolis, Indiana, USA E-mail: Ryan P Hafen* - [email protected]; David E Anderson - [email protected]; William S Cleveland - [email protected]; Ross Maciejewski - [email protected]; David S Ebert - [email protected]; Ahmad Abusalah - [email protected]; Mohamed Yakout - [email protected]; Mourad Ouzzani - [email protected]; Shaun J Grannis - [email protected] *Corresponding author Published: 21 April 2009 Received: 21 January 2009 BMC Medical Informatics and Decision Making 2009, 9:21 doi: 10.1186/1472-6947-9-21 Accepted: 21 April 2009 This article is available from: http://www.biomedcentral.com/1472-6947/9/21 © 2009 Hafen et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Public health surveillance is the monitoring of data to detect and quantify unusual health events. Monitoring pre-diagnostic data, such as emergency department (ED) patient chief complaints, enables rapid detection of disease outbreaks. There are many sources of variation in such data; statistical methods need to accurately model them as a basis for timely and accurate disease outbreak methods. Methods: Our new methods for modeling daily chief complaint counts are based on a seasonal-trend decomposition procedure based on loess (STL) and were developed using data from the 76 EDs of the Indiana surveillance program from 2004 to 2008. Square root counts are decomposed into inter-annual, yearly-seasonal, day-of-the-week, and random-error components. Using this decomposition method, we develop a new synoptic-scale (days to weeks) outbreak detection method and carry out a simulation study to compare detection performance to four well-known methods for nine outbreak scenarios. Result: The components of the STL decomposition reveal insights into the variability of the Indiana ED data. Day-of-the-week components tend to peak Sunday or Monday, fall steadily to a minimum Thursday or Friday, and then rise to the peak. Yearly-seasonal components show seasonal influenza, some with bimodal peaks. Some inter-annual components increase slightly due to increasing patient populations. A new outbreak detection method based on the decomposition modeling performs well with 90 days or more of data. Control limits were set empirically so that all methods had a specificity of 97%. STL had the largest sensitivity in all nine outbreak scenarios. The STL method also exhibited a well- behaved false positive rate when run on the data with no outbreaks injected. Conclusion: The STL decomposition method for chief complaint counts leads to a rapid and accurate detection method for disease outbreaks, and requires only 90 days of historical data to be put into operation. The visualization tools that accompany the decomposition and outbreak methods provide much insight into patterns in the data, which is useful for surveillance operations. Page 1 of 11 (page number not for citation purposes) BioMed Central Open Access

BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

BMC Medical Informatics and DecisionMaking

Research articleSyndromic surveillance: STL for modeling, visualizing, andmonitoring disease countsRyan P Hafen*1, David E Anderson2, William S Cleveland1,Ross Maciejewski3, David S Ebert3, Ahmad Abusalah4, Mohamed Yakout4,Mourad Ouzzani4 and Shaun J Grannis5,6

Address: 1Department of Statistics, Purdue University, West Lafayette, Indiana, USA, 2Department of Mathematics, Xavier University, NewOrleans, Louisiana, USA, 3Department of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana, USA, 4Departmentof Computer Science, Purdue University, West Lafayette, Indiana, USA, 5Regenstrief Institute, Indianapolis, Indiana, USA and 6School ofMedicine, Indiana University, Indianapolis, Indiana, USA

E-mail: Ryan P Hafen* - [email protected]; David E Anderson - [email protected]; William S Cleveland - [email protected];Ross Maciejewski - [email protected]; David S Ebert - [email protected]; Ahmad Abusalah - [email protected];Mohamed Yakout - [email protected]; Mourad Ouzzani - [email protected]; Shaun J Grannis - [email protected]*Corresponding author

Published: 21 April 2009 Received: 21 January 2009

BMC Medical Informatics and Decision Making 2009, 9:21 doi: 10.1186/1472-6947-9-21 Accepted: 21 April 2009

This article is available from: http://www.biomedcentral.com/1472-6947/9/21

© 2009 Hafen et al; licensee BioMed Central Ltd.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background: Public health surveillance is the monitoring of data to detect and quantify unusual healthevents. Monitoring pre-diagnostic data, such as emergency department (ED) patient chief complaints,enables rapid detection of disease outbreaks. There are many sources of variation in such data; statisticalmethods need to accurately model them as a basis for timely and accurate disease outbreak methods.

Methods: Our new methods for modeling daily chief complaint counts are based on a seasonal-trenddecomposition procedure based on loess (STL) and were developed using data from the 76 EDs of theIndiana surveillance program from 2004 to 2008. Square root counts are decomposed into inter-annual,yearly-seasonal, day-of-the-week, and random-error components. Using this decomposition method, wedevelop a new synoptic-scale (days to weeks) outbreak detection method and carry out a simulationstudy to compare detection performance to four well-known methods for nine outbreak scenarios.

Result: The components of the STL decomposition reveal insights into the variability of theIndiana ED data. Day-of-the-week components tend to peak Sunday or Monday, fall steadily to aminimum Thursday or Friday, and then rise to the peak. Yearly-seasonal components showseasonal influenza, some with bimodal peaks.

Some inter-annual components increase slightly due to increasing patient populations. A newoutbreak detection method based on the decomposition modeling performs well with 90 days ormore of data. Control limits were set empirically so that all methods had a specificity of 97%. STLhad the largest sensitivity in all nine outbreak scenarios. The STL method also exhibited a well-behaved false positive rate when run on the data with no outbreaks injected.

Conclusion: The STL decomposition method for chief complaint counts leads to a rapid andaccurate detection method for disease outbreaks, and requires only 90 days of historical data to beput into operation. The visualization tools that accompany the decomposition and outbreakmethods provide much insight into patterns in the data, which is useful for surveillance operations.

Page 1 of 11(page number not for citation purposes)

BioMed Central

Open Access

Page 2: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

BackgroundThe development of statistical methods for accuratelymodeling disease count time series for the earlydetection of bioterrorism and pandemic events is avery active area of research in syndromic surveillance[1-15]. Early detection requires accurate modeling of thedata.

A challenge to modeling is systematic components oftime variation whose patterns have a level of predict-ability – inter-annual (long-term trend), day-of-the-week, and yearly-seasonal components [2,10]. Thevariation can be ascribed to causal factors: for theinter-annual component the factor can be an increaseor decrease in the population of people from whichpatients come; for the yearly-seasonal component it ischanging weather and human activities over the courseof a year; and for the day-of-the-week component it is achanging propensity of patients to seek medical atten-tion according to the day of the week. In addition tothese components is a noise component: random errorsnot assignable to causal factors that often can bemodeled as independent random variables.

The most effective approach to early outbreak detectionis to account for the systematic components of variationand the noise component through a model, and thenbase a detection method on values of the systematiccomponents and the statistical properties of the randomerrors. Methods that do not accurately model thecomponents can suffer in performance. The modelpredicts the systematic recurring behavior of thesystematic components so it can be discounted by thedetection method, and specifies the statistical propertiesof the counts through the stochastic modeling of thenoise component. The two dangers are lack of fit –

patterns in the systematic components are missed – andoverfitting – the fitted values are unnecessarily noisy.Both are harmful. Lack of fit results in increasedprediction squared bias, and overfitting results inincreased prediction variance. The prediction mean-squared error is the sum of squared bias and variance.

Many modeling methods require large amounts ofhistorical data. Examples are the extended baselinemethods [11-15] of the Early Aberration ReportingSystem (EARS) [4], which require at least 3 years ofdata. Another popular example is a seasonal autoregres-sive integrated moving average (ARIMA) model in [3]that is fitted to 8 years of data. Such methods are notuseful for many surveillance systems that have recentlycome online, so methods have been proposed thatrequire little historical data [16]. They typically employ

moving averages of recent data, such as C1-MILD (C1),C2-MEDIUM (C2), and C3-ULTRA (C3), which are usedin EARS. But such methods do not provide optimalperformance because they do not exploit the fullinformation in the data; they smooth out somecomponent effects and operate locally to avoid others,rather than accounting for the components.

Our research goal is accurate modeling of components ofvariation, but without requiring a large amount ofhistorical data. The research was carried as part of ouranalysis of the daily counts of chief complaints fromemergency departments (EDs) of the Indiana PublicHealth Emergency Surveillance System (PHESS) [17].The complaints are divided into eight classifications byCoCo [18]. Data for the first EDs go back to November2004, and new EDs have come online continually sincethen. There are now 76 EDs in the system.

There are many ways to proceed in modeling chief-complaint time series. One is to develop parametricmodels. However, our new modeling methods are basedon a nonparametric method, seasonal-trend decomposi-tion using loess (STL) [19], which is very flexible and canaccount for a much wider range of component patternsthan any single parametric model. Just as importantly, itcan be used with as little as 90 days of data. We have alsodeveloped a new synoptic-scale (days to weeks) outbreakdetection method based on the STL modeling.

MethodsDataThe STL decomposition was run on all Indiana EDs forrespiratory and gastro-intestinal counts. Both are funda-mental markers for a number of naturally occurringdiseases, and research has shown that diseases frombioweapons have early characterization of influenza-likeillness [20], which typically results in respiratorycomplaints.

We present results for the respiratory time series for the30 EDs that came online the earliest; these series end inApril 2008, and start at times ranging from November2004 to September 2005. All analyses were performedusing the R statistical software environment [21], and anR package is available as a supplemental download [seeadditional file 1].

ModelingSTL decomposes a time series into components ofvariation [19] using a local-regression modelingapproach, loess [22]. Our methods use STL to

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 2 of 11(page number not for citation purposes)

Page 3: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

decompose the square root of ED daily counts into fourcomponents:

Y T S D Nt t t t t= + + + , (1)

where t is time in units of years and increments daily, Ytis the respiratory count on day t, Tt is an inter-annualcomponent that models long-term trend, St is a yearly-seasonal component, Dt is a day-of-the-week compo-nent, and Nt is a random-error noise component.

Figure 1 shows the STL decomposition of Yt for oneED. Tt is nearly constant. St has peaks due to seasonalinfluenza: single peaks in early 2005, 2007, and 2008,and a double peak in late 2006 and early 2007. Whilesome of these yearly-seasonal effects are visible in theraw data, the effects are much more effectively seen in Stbecause other effects including the noise are not present.Variation in Dt is small compared with the total variation

of Yt , but it cannot be ignored in the modelingbecause its variation is not small compared with theinitial growth of critical disease outbreaks. Nt is irregularin behavior, and can be accurately modeled as indepen-dent, identically distributed normal random variableswith mean 0. The square root transformation results inmuch simpler statistical properties. For example, thestandard deviations of the Nt are nearly constant withinand between hospitals, and the marginal distribution isnearly normal with mean 0 and variance s N

2 . Neitherare true for the untransformed counts. The behavior ofthe Nt after transformation is not surprising because theraw counts can be thought of as Poisson randomvariables, and the square root of a Poisson randomvariable whose mean is not too small is approximatelynormal with a standard deviation of 0.5 [23].

STL components of variation arise from smoothing thedata using moving weighted-least-squares polynomialfitting with a moving window bandwidth in days. Thedegree of the polynomial is 0 (locally constant), 1(locally linear), or 2 (locally quadratic).

The inter-annual component, Tt, arises from locallylinear fitting with a bandwidth of 1000 days, a verylow-frequency component. (Window bandwidths can beformally larger than the number of available datapoints.)

For the yearly-seasonal component, St, a critical idea ofour method is not pooling values across years with thesame time of year – values of January 1, values of January2, etc. – as is often done. STL methodology can providepooling, but because our methods are designed to workwith limited data, St at a time t uses data in aneighborhood of t, and not across years. St is a low-frequency component with a bandwidth of 90 days andwith a blending of locally quadratic and locally constantfitting. It is possible that this local method would bebetter in many surveillance applications even withsubstantially more data if the phase and shape aresufficiently variable from one year to the next. Forexample, for some Indiana EDs, seasonal influenzapeaks are unimodal in some years and bimodal inothers. Our modeling tracks this accurately, but aver-aging across years could easily distort the patterns, forexample, making the bimodal peaks merge into unim-odal ones. Other research has also found that the "oneseason fits all" assumption, leading to pooling acrossyears, is not suitable for disease surveillance data [10].

The blending for the St results in nearly stationary noise,Nt. Local smoothing methods tend to produce systematiccomponents that have higher standard deviation as thetime approaches the ends of the data, making the noise

Figure 1STL decomposition for respiratory square root dailycounts. Respiratory square root daily counts and fourcomponents of variation of the STL decomposition for anIndiana emergency department (ED). The four componentssum to the square root counts. The solid vertical lines showJanuary 1 and the dashed vertical lines show April 1, July 1,and October 1.

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 3 of 11(page number not for citation purposes)

Page 4: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

component have a lower standard deviation at the ends.We countered this with a new smoothing method,blending. St results from blending locally quadraticsmoothing and locally constant smoothing, a weightedaverage of the two for the 50 days at each end of the data.The weight for the quadratic smoother is 0.7 at the firstor last observation, and increases linearly moving awayfrom each end, becoming 1 at the 50th observation fromeach end. This means the weight for the constantsmoother goes from 0.3 to 0.

The fitting algorithm begins with computation of a day-of-the-week component, Dt, directly employing themethod described in [19]. STL can allow Dt to slowlyevolve though time but we found that Dt is a strictlyperiodic component. Dt results from an iterative process.First, a low-middle-frequency component is fitted usinglocally linear fitting with a bandwidth of 39 days. ThenDt is the result of means for each day-of-the-week of the

Yt minus the low-middle-frequency component. Thenthe current Dt is subtracted from the Yt and thelow-middle-frequency component is re-computed. Thisiterative process is continued to convergence. Wesubtract the final day-of-the-week component from the

Yt , and then use loess smoothing on the result toobtain the inter-annual component Tt as describedabove. Finally, we subtract the day-of-the-week andinter-annual components from the Yt and smooth theresult to obtain St as described above.

We selected the stable Dt assumption and the band-widths of Tt and St through extensive use of visualizationand numerical methods of model checking in order toprevent overfitting or a lack of fit of the components ofvariation, and to keep two components from competingfor the same variation in the data.

Outbreak DetectionSTL modeling provides public health officials with aclear view of yearly-seasonal variation for visual mon-itoring of the onset and magnitude of seasonal peaks.This is facilitated by the smoothness of St as opposed tothe raw data.

Another critical task is the detection of outbreaks thatrise up on the order of several days to several weeks andare not part of the systematic patterns in disease counts;the causes are bioterrorism and uncommon highlypernicious diseases. Following the terminology ofmeteorology, we will refer to this as a "synoptic time-scale outbreak".

Our synoptic-scale detection method uses a simplecontrol chart based on the assumption that the daily

counts are independently distributed Poisson randomvariables with a changing mean lt. These assumptionsare validated in the results section. The variability in thesystematic components Tt, St, and Dt is taken to benegligible which is reasonable since they are relativelylow-frequency components. We take all variability to bein Nt. From Equation 1,

l st t t t t NE Y T S D= = + + +( ) ( ) .2 2 (2)

s N2 is estimated by the sample standard deviation of

the Nt.

With Yt as a Poisson random variable with with mean lt,our method declares an outbreak alarm when Yt resultsin a value yt for which P (Yt ≥ yt; lt) is less than athreshold r. We evaluate this synoptic-scale outbreakmethod using the historical data for baselines andadding artificial outbreaks, as has been done in otherresearch [24,25]. We use three baselines: respiratory dailycounts from three EDs with low, medium, and high dailymeans – 9.25, 17.33, and 23.86.

The outbreak model is a lognormal epicurve of Sartwell[26]. A starting day is selected and becomes day 1. A totalnumber of cases, O, is selected; the bigger the value of O,the more readily the outbreak can be detected. The day ofeach case is a random draw, rounded to the nearestpositive integer, from a lognormal distribution whoseminimum is 0.5. For our evaluation, with the lognormaldensity as

f xx

e

x

( )

(log )

,=

− −1

2

2

2 2

ps

z

s (3)

we use ζ = 2.401 and s = 0.4626 to approximatetemporal behavior of an anthrax outbreak [27]. Thelognormal density times a constant is shown in Figure 2,which will be fully explained later.

Detectability at a point in time depends on the thenumber of outbreak cases per day compared with thevariability of the baseline counts over the outbreakperiod. As we have emphasized earlier, the variability ofan ED count time series is larger when the level of theseries is larger than when the level is smaller. Because Ois fixed, detectability changes through time.

We used three values of O for each baseline, selectedbased on a method in [1]. There are many ways to makethe selection, but we chose this one to remain consistentwith past work. The maximum of our lognormalepicurve is 0.087 and occurs at ez s− 2 = 8.9 days. If we

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 4 of 11(page number not for citation purposes)

Page 5: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

suppose the density is this value for 8.9 ± 0.5 days, then theexpected number of cases in this peak interval is 0.087O.Detectability can be controlled by making the expectedpeak-interval cases equal to a magnitude factor f times thesample standard deviation of the residual counts Yt - (Tt + St+ Dt)

2. The parameter f controls the detectability.

The three values of O in our evaluation for each baselinewere chosen by this method with f = 1, 1.5, and 2. Forexample, for the baseline with the smallest mean dailycounts, the standard deviation of the residual counts is3.324. For f = 2, we have 2 × 3.324/0.087 = 76.414, androunding to the nearest integer,O = 76. This case is shownin Figure 2. The curve is the lognormal density times 76,and the histogram shows 76 draws from the lognormal.

We compared this synoptic-scale method to othermethods. Because some of our surveillance EDs have arather small number of observations, we compared tothe EARS C1, C2, and C3 methods [4]. These methodsare based on a simple control chart approach with thedaily test statistic calculated as

S Yt t t t t= − +max( , ( ) / ),0 m s s (4)

where μt and st are a 7-day baseline mean and standarddeviation. For the C1 method, the baseline window for μtand st is days t - 7,..., t - 1 and for the C2 method, thebaseline window is days t - 9,..., t - 3. The test statistic forthe C3 method is St + St-1 + St-2, where St-1 and St-2 areadded if they do not exceed the threshold.

Since some EDs have more than 3 years of data, we alsoconsidered some of the EARS longer historical baseline

methods. However, we chose not to implement any ofthem since it has been shown that these do not offermuch of an advantage over the limited baseline methods[16]. Instead, we compared with a Poisson generalizedlinear model (GLM) method [24], which has terms forday-of-the-week, month, linear trend, and holidays. Wedid not include holidays since we were not able to findthe details of its implementation. We also comparedwith a seasonal ARIMA model [3] but found its resultsunsatisfactory, so we do not include it here. This couldbe due to insufficient data for pooling across years, orthat such pooling through the seasonal terms of themodel cannot accommodate the substantial change fromone year to the next in the seasonal patterns.

Outbreak detection was tested using 1004 days of eachbaseline. The first 365 days acted as historical data forthe GLM method to have sufficient data to estimatecoefficients. For each of the 9 combinations of outbreakdetectability and baseline, we simulated 625 differentoutbreaks; the first started on day 366, the second on day367, and so forth to the final on day 990.

Each outbreak was a different random sample from thelognormal epicurve. One sample is shown in Figure 2.The counts of each outbreak were added to the baselinecounts to form Yt. We ran each method through each dayof an outbreak as would be done in real practice, andtracked whether the outbreak was detected and, if so,how long it took to detect. Time until detection ismeasured by the number of days after first exposure untilthe day of detection. Any evaluation that did not resultin detection on or before day 14 of the outbreak wasclassified as not-detected.

We also investigated the minimum amount of dataneeded by the STL method to achieve good performance.Because Tt is very stable, and the window for St is 90days, we would expect STL to do well for 90 days ormore. To test this, we ran our methods at each time pointfrom 366 to 990 using just the most recent 90 days ofbaseline data for each outbreak scenario.

We ran all methods at a false positive rate of 0.03,achieved empirically by choosing a cutoff, r, for eachmethod from the historical data with no outbreaksinjected. For the C1, C2, and C3 methods, this meanschoosing r to be the 97th percentile of the daily teststatistic values. Since the STL and GLM methods updatepast fitted values as they progress, choosing r for thesemethods cannot be done by simply fitting the entire timeseries and retrospectively choosing r. Instead, we use thefitted values obtained at the last day of each fit over timeto chose r.

Figure 2Outbreak sample. Randomly generated outbreak of 76cases injected according to the Sartwell model [26] is shownby the histogram. The lognormal density of the modelmultiplied by 76 is shown by the curve. The parameters ofthe lognormal are the mean, ζ = 2.401, and the standarddeviation, s = 0.4626, both on the natural log scale.

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 5 of 11(page number not for citation purposes)

Page 6: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

In practice, limits for the STL and GLM methods wouldbe set by using the theoretical false positive rate. If amodel fits the data, the observed rates should beconsistent with the theoretical rate. We compared theGLM and STL methods by studying their observed ratesfor the 30 EDs using r = 0.03.

ResultsSTL ModelingThe four components of variation – Dt, Tt, St, and Nt – ofthe STL modeling of the square root respiratory counts,

Yt , for the 30 EDs are presented in Figures 3, 4, 5, and6. To maintain anonymity of the hospitals, names havebeen replaced by three letter words, and each Tt has beenaltered by subtracting its mean.

The day-of-the-week components, Dt, in Figure 3 tend topeak on Sunday or Monday, fall to a minimum Thursdayor Friday, and then rise. For some EDs, Dt is very highover the weekend but for others is considerably lower.

Figure 4 shows the yearly-seasonal component, St. Theflu season peaks for 2004–2005 and 2007–2008 tend tobe larger than those for 2005–2006 and 2006–2007.Many EDs show a double peak for 2006–2007.

Some Tt in Figure 5 show an increase and others arenearly constant. We attribute the increase to a growingpopulation using an ED, and no growth to a stablepopulation. Our conclusion is in part based on obser-ving a similar pattern for gastro-intestinal counts.

Diagnostic plots validated the STL modeling. Oneexample, Figure 6, shows normal quantile plots of theNt. The lines on the plots are drawn through the lowerand upper quartile points. Because the data of the panelsare close to the lines, the normal is a good

Figure 3Day-of-the-week component. Day-of-the-weekcomponent, Dt, for square root respiratory counts for 30Indiana EDs. The general pattern is U-shaped with a Mondayor Sunday maximum and a Thursday or Friday minimum.

Figure 4Yearly-seasonal component. Yearly-seasonalcomponent, St, for square root respiratory daily counts for30 Indiana EDs. Overall, patterns are similar, but detailedbehavior varies with unimodal and bimodal peaks, differenttimes of onset of yearly-seasonal disease, and different timesof disease peaks.

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 6 of 11(page number not for citation purposes)

Page 7: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

approximation of the distribution. Plots of the auto-correlation functions of the Nt showed the noisecomponent can be modeled as independent randomvariables. Loess smoothing of the Nt against t verifiedthat no variation that should have been in Tt or St leakedinto the Nt. Separate loess plots of the Nt for each day-of-the-week verified that all significant day-of-the-weekeffects were captured by Dt.

Modeling on the Counts ScaleThe three systematic components – Tt, St, and Dt – areeach weighted averages of a large number of observa-tions, so the values are are comparatively stable. Thestatistical variability in Yt is chiefly the result of Nt,modeled as independent, identically distributed normalrandom variables with variance s N

2 . Thus the samplemean of Yt is lt from Equation 2.

The square root of a Poisson variable whose mean is nottoo small is approximately normal with a standard

deviation of 0.5 [23]. The sample standard deviationss N of the 30 noise components are all greater than 0.5,indicating a systematic departure, but are not far from0.5. The upper quartile of the estimates is 0.58 and themedian is 0.54; two estimates reach as high as 0.63 butthis is due to a few outliers. The consistency of the valuescan be seen in Figure 6; the slopes of the lines on thedisplay are nearly the same.

The near normality of Nt and the closeness of the s N to0.5 are consistent with the Yt having a distribution that iswell approximated by the Poisson with mean lt.However, in carrying out the detection method, we uses N in place of the theoretical value 0.5 to provide ameasure of robustness to the Poisson model.

Outbreak DetectionTable 1 reports the observed sensitivity and averagenumber of days until detection for each baseline,outbreak magnitude, and method. Figure 7 graphs thesensitivity versus magnitude. The days until detection arerecorded including the incubation period, and referringto Figure 2, outbreak cases usually do not begin until day3 or 4. Detection before day 6 or day 7 is before theexpected outbreak peak. The C1 method typically has theearliest detection, but the sensitivity is poor. The GLMand STL methods have the best sensitivity, with the STLmethod clearly ahead in all cases. The STL method

Figure 5Inter-annual component. Inter-annual component, Tt, forrespiratory square root counts for 30 Indiana EDs. Eachcomponent has been centered around zero to protectanonymity. The long-term trend for each ED is either nearlyconstant or has a small increase due to a growing patientpopulation.

Figure 6Normal probability plots for Nt. Normal quantile plotsfor the noise component, Nt, of the square root respiratorycounts for 30 EDs. The sample distribution of the noise iswell approximated by the normal distribution. This is a resultof the square root transformation of the counts.

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 7 of 11(page number not for citation purposes)

Page 8: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

detects outbreaks somewhat faster than the GLMmethod.

At the lowest outbreak magnitude, the STL methodoutperforms all other methods by a difference of 10%sensitivity. At higher magnitudes, the gap closes, but thisis not surprising because all reasonable methods shouldeventually converge to 100% detection as magnitudeincreases. Comparisons of sensitivity across baselineswithin each method and magnitude show quite a bit ofvariability. This may be due to unique characteristics ofeach ED, the random outbreak generation, or the fact

that the cutoff values were set empirically within eachbaseline.

The results of running the STL method on the same daterange, but using only 90 days of baseline data for eachscenario, are shown in final column of Table 1. Thesenumbers are very comparable to the results using allavailable data. This indicates that the STL detectionmethod is effective when 90 days of data are available.

Figure 8 shows quantile plots of the observed falsepositive rates from the GLM and STL methods for thehistorical data with no outbreaks and a theoretical falsepositive rate of 0.03 for all 30 EDs. The observed rates forthe GLM methods deviate from 0.03 by far more than forSTL.

To better understand the difference in performanceacross outbreak methods, we investigated the modelfitted values (predicted systematic values) and residuals(data minus the fits) from each method. As we haveemphasized, the methods of statistical modeling uponwhich each detection method is based have a largeimpact on performance. For the EARS methods, the fittedvalue for day t is the seven-day baseline raw data mean μtfrom Equation 4. For the GLM and the STL methods, thefitted value for day t is the evaluation of the systematiccomponents at time t for a fit through time t, just as itwould be used in practice for outbreak detection. GLMbases this on data for a number of years; STL bases it on90 days to a good approximation unless there issubstantial long-term trend, which there is not in ourIndiana surveillance data.

Figures 9 and 10 are two residual diagnostic plots tocheck for lack of fit in the EARS, GLM, and STL methods

Table 1: Outbreak detection simulation results

Sensitivity

Baseline Magnitude C1 C2 C3 GLM STL STL(90)

1 1.0 0.57 (5.44) 0.50 (5.50) 0.46 (5.48) 0.60 (6.58) 0.71 (6.44) 0.66 (6.65)1 1.5 0.57 (5.42) 0.55 (5.65) 0.55 (6.05) 0.70 (6.80) 0.73 (6.54) 0.83 (6.57)1 2.0 0.69 (5.39) 0.70 (5.64) 0.72 (6.13) 0.88 (6.24) 0.89 (6.22) 0.93 (6.07)

2 1.0 0.48 (5.43) 0.56 (5.90) 0.56 (5.89) 0.65 (6.01) 0.80 (6.31) 0.82 (6.90)2 1.5 0.57 (6.03) 0.72 (6.68) 0.75 (6.81) 0.81 (6.72) 0.90 (6.63) 0.89 (5.96)2 2.0 0.68 (5.11) 0.81 (5.39) 0.84 (5.53) 0.91 (5.36) 0.95 (5.19) 0.91 (5.93)

3 1.0 0.59 (5.51) 0.60 (5.74) 0.62 (5.80) 0.58 (6.20) 0.73 (6.62) 0.70 (6.74)3 1.5 0.68 (6.01) 0.71 (6.37) 0.74 (6.49) 0.76 (7.00) 0.83 (6.82) 0.81 (5.93)3 2.0 0.71 (5.47) 0.76 (5.82) 0.80 (5.87) 0.84 (6.21) 0.87 (6.00) 0.87 (5.70)

Outbreak detection simulation results for three baselines and three outbreak magnitudes.Sensitivity is reported with mean days until detection in parentheses. The false positive rate was set empirically for each method and baseline to be0.03. The superiority of the sensitivity of the STL method for these scenarios is evident, especially at the smallest outbreak magnitude. The results inthe final column, STL(90), are from allowing only a 90 day historical baseline for each outbreak scenario.

Figure 7Outbreak simulation results. Outbreak detectionsimulation results. The false positive rate was set empiricallyfor each method and baseline to be 0.03. For each baseline,the STL method detects more than 10% more outbreaksthan the other methods at the smallest magnitude.

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 8 of 11(page number not for citation purposes)

Page 9: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

for the daily respiratory counts for one ED. Figure 9graphs the residuals against t for each model; a loesscurve shows the local average fluctuations of theresiduals around 0. Figure 10 graphs the residualsagainst week for each day-of-week, also with loesscurves. Figure 9 shows substantial lack of fit for GLM,systematic fluctuations from 0. This is due to poolingacross years to estimate the yearly seasonal component,which can be quite different from one year to the next.Lack of fit revealed for EARS and STL is minor. Figure 10shows systematic lack of fit by day of the week for EARS,but not STL and GLM. The reason is simply that EARSdoes not model day of the week variation.

Figure 11 checks the variability in the EARS, GLM, andSTL fitted values for the same ED of the previous twofigures. We want the fitted values to be as smooth aspossible without introducing lack of fit. For EARS, thefitted values are graphed against t. To make comparisons

Figure 8Observed false positive rates for STL and GLM.Quantile plots of observed false positive rates for the STLand GLM methods based on a theoretical false positive rateof 0.03, from respiratory counts for each of the 30 EDs. Thedashed lines represent the median value for each method.

Figure 9Residuals for model fits. Residuals for model fits to dailyrespiratory counts for one ED. The EARS residuals are theobserved count minus the 7 day baseline mean with lag 2.The GLM and STL residuals are obtained from the modelpredicted values. The smooth curve is the local mean of theresiduals.

Figure 10Residuals by day-of-the-week. Residuals for model fits todaily respiratory counts for one ED, by day-of-the-week. Thesmooth curve is the local mean of the residuals.

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 9 of 11(page number not for citation purposes)

Page 10: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

of EARS with GLM and STL commensurate, we graph theGLM and STL fitted values minus their day-of-weekcomponents against t. The EARS fit is much noisier thanthat of STL, the result of overfitting. In particular there isa large amount of synoptic scale variation, making itmore difficult to detect synoptic-scale outbreaks. GLM islocally quite smooth at most points in time, but haslarge precipitous jumps at certain time points becausethe yearly seasonal component is modeled as a stepfunction, constant within months, but changing dis-continuously from one month to the next.

DiscussionFor our data, the same smoothing parameters weresuccessfully used across all EDs. We advise thoseinterested in applying these methods to their syndromicdata to perform their own parameter selection and modelvalidation. For outbreak detection, our data behaved likePoisson random variables, but this may not be the case inother applications. We advise checking this assumptionand if it does not hold, investigating other possibilitiessuch as the over-dispersed Poisson. Another alternative isvalidating distributional properties of the remainder termand using this term for monitoring.

Although our detection performance results favor theSTL method, this does not necessarily mean that it is thebest method available. There are certainly other methods

with which we could have compared. We chose the EARSmethods since they are widely used and can be appliedto very short disease time series. Our detection study waslimited to one type of outbreak. Further use and study ofthis method will determine its merit. The visualizationcapabilities of STL modeling make it a useful tool forvisual analytics.

The STL method should not be used for small countsclose to and including zero. While they work well withthe large counts of the respiratory and gastro-intestinalcategories, many other categories such as botulinic havecounts that are too small for the square roots to beapproximately normally distributed. Future work caninvestigate employing STL ideas for small counts byreplacing the square root normal distribution and localleast-squares fitting with the Poisson distribution andlocal Poisson maximum likelihood fitting.

ConclusionThe STL decomposition methods presented here effec-tively model chief complaint counts for syndromicsurveillance without significant lack of fit or unduenoise, and lead to a synoptic time-scale disease outbreakdetection method that in our testing scenarios performsbetter than other methods to which is it compared –

EARS C1, C2, and C3 methods [4], as well as a PoissonGLM modeling method [24]. The methods can be usedfor disease time series as short as 90 days, which isimportant because many surveillance systems havestarted only recently and have a limited number ofobservations. Visualization of the components of varia-tion – inter-annual, yearly-seasonal, day-of-the-week,and random-error – provides much insight into theproperties of disease time series. We recommend themethods be considered by those who manage publichealth surveillance systems.

Competing interestsThe authors declare that they have no competinginterests.

Authors' contributionsDSE, WSC, and SJG designed the research project, obtainedfunding, and recruited other members. RPH, DA, and WSCdeveloped and implemented the statistical methods andwrote the first draft of the manuscript. RPH developed thedisease outbreak detection method and carried outthe simulations. RPH, DA, WSC, and RM conducted theliterature review. AA, MY, MO, and SJG designed the datacollection, the database schema, and the form of thecontinuous data feed. All authors participated in the reviewof the manuscript and approved it.

Figure 11Fitted components. Fitted components for dailyrespiratory counts for one ED. The EARS fitted value at day tis the the 7 day baseline mean with lag 2. The GLM and STLfitted values are the predicted values from fitting the modelsup to day t but with the day-of-the-week componentsremoved to make comparison with the variability of theEARS fitted values commensurate.

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 10 of 11(page number not for citation purposes)

Page 11: BMC Medical Informatics and Decision Making BioMed Centralml.stat.purdue.edu/docs/stldisease.biomed.2009.pdf · 2004, and new EDs have come online continually since then. There are

Additional material

Additional file 1R code and documentation. contains R source code, examples, data,and documentation for carrying out the STL procedure.Click here for file[http://www.biomedcentral.com/content/supplementary/1472-6947-9-21-S1.zip]

AcknowledgementsThis work has been funded by the Regional Visualization and AnalyticsCenter (RVAC) program of the U.S. Department of Homeland Security.

References1. Burkom H: Development, adaptation, and assessment of

alerting algorithms for biosurveillance. Johns Hopkins APLTechnical Digest 2003, 24(4):335–342.

2. Burkom H, Murphy S and Shmueli G: Automated timeseries forecasting for biosurveillance. Stat in Med 2007,26(22):4202–4218.

3. Reis B and Mandl K: Time series modeling for syndromicsurveillance. BMC Med Inform Decis Mak 2003, 3:2.

4. Hutwagner L, Thompson W, Seeman G and Treadwell T: Thebioterrorism preparedness and response Early AberrationReporting System (EARS). J Urban Health 2003, 80:i89–i96.

5. Wallenstein S and Naus J: Scan statistics for temporalsurveillance for biologic terrorism. MMWR Morb Mortal WklyRep 2004, 53 Suppl:74–78.

6. Brillman J, Burr T, Forslund D, Joyce E, Picard R and Umland E:Modeling emergency department visit patterns for infec-tious disease complaints: results and application to diseasesurveillance. BMC Med Inform Decis Mak 2005, 5(4):1–14.

7. Dafni U, Tsiodras S, Panagiotakos D, Gkolfinopoulou K,Kouvatseas G, Tsourti Z and Saroglou G: Algorithm forstatistical detection of peaks – syndromic surveillancesystem for the Athens 2004 olympic games. MMWR MorbMortal Wkly Rep 2004, 53:86–94.

8. Kleinman K, Abrams A, Kulldorff M and Platt R: A model-adjustedspace-time scan statistic with an application to syndromicsurveillance. Epidemiology and Infection 2005, 133(03):409–419.

9. Wong W, Moore A, Cooper G and Wagner M: Rule-basedanomaly patterndetection for detectingdiseaseoutbreaks. Proc.18th Nat. Conf. on Artificial Intelligence Menlo Park, CA; Cambridge,MA; London; AAAI Press; MIT Press; 1999, 2002:217–223.

10. Burr T, Graves T, Klamann R, Michalak S, Picard R andHengartner N: Accounting for seasonal patterns in syndromicsurveillance data for outbreak detection. BMC Med Inform DecisMak 2006, 6:40.

11. Stroup D, Wharton M, Kafadar K and Dean A: Evaluation of amethod for detecting aberrations in public health surveil-lance data. Am J Epidemiol 1993, 137(3):373–380.

12. Hutwagner L, Maloney E, Bean N, Slutsker L and Martin S: Usinglaboratory-based surveillance data for prevention: an algo-rithm for detecting salmonella outbreaks. Emerg Infect Dis1997, 3:395–400.

13. Farrington C, Andrews N, Beale A and Catchpole M: A statisticalalgorithm for the early detection of outbreaks of infectiousdisease. J R Slat Soc Ser A Stat Soc 1996, 159(3):547–563.

14. Stern L and Lightfoot D: Automated outbreak detection: aquantitative retrospective analysis. Epidemiol Infect 1999,122(01):103–110.

15. Simonsen L: The impact of influenza epidemics on mortality:introducing a severity index. American Journal of Public Health1997, 87(12):1944–1950.

16. Hutwagner L, Browne T, Seeman G and Fleischauer A: Comparingaberration detection methods with simulated data. EmergInfect Dis 2005, 11(2):314–316.

17. Grannis S, Wade M, Gibson J and Overhage J: The Indiana publichealth emergency surveillance system: Ongoing progress,early findings, and future directions. AMIA Annu Symp Proc 2006,304–308.

18. Olszewski R: Bayesian classification of triage diagnoses forthe early detection of epidemics. Proceedings of the Sixteenth

International Florida Artificial Intelligence Research Society ConferenceMenlo Park, CA: AAAI; 2003, 412–7.

19. Cleveland R, Cleveland W, McRae J and Terpenning I: STL: Aseasonal-trend decomposition procedure based on loess(with discussion). J Offic Stat 1990, 6:3–73.

20. Franz D, Jahrling P, Friedlander A, McClain D, Hoover D, Bryne W,Pavlin J, Christopher G and Eitzen E: Clinical recognition andmanagement of patients exposed to biological warfareagents. JAMA 1997, 278(5):399–411.

21. R Development Core Team: R: A language and environment forstatistical computing R Foundation for Statistical Computing, Vienna,Austria; 2005 http://www.R-project.org.

22. Cleveland W and Devlin S: Locally weighted regression: anapproach to regression analysis by local fitting. JASA 1988,83(403):596–610.

23. Johnson NL, Kotz S and Kemp A: Univariate Discrete Distributions NewYork: Wiley; 1992.

24. Jackson M, Baer A, Painter I and Duchin J: A simulation studycomparing aberration detection algorithms for syndromicsurveillance. BMC Med Inform Decis Mak 2007, 7:6–6.

25. Mandl K, Reis B and Cassa C: Measuring outbreak-detectionperformance by using controlled feature set simulations.MMWR Morb Mortal Wkly Rep 2004, 53:130–6.

26. Sartwell P: The distribution of incubation periods of infectiousdisease. Am J Hyg 1950, 51(3):310–318.

27. Meselson M, Guillemin J, Hugh-Jones M, Langmuir A, Popova I,Shelokov A and Yampolskaya O: The Sverdlovsk anthraxoutbreak of 1979. Science 1994, 266(5188):1202–1208.

Pre-publication historyThe pre-publication history for this paper can beaccessed here:

http://www.biomedcentral.com/1472-6947/9/21/prepub

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

BMC Medical Informatics and Decision Making 2009, 9:21 http://www.biomedcentral.com/1472-6947/9/21

Page 11 of 11(page number not for citation purposes)