Upload
sherry
View
17
Download
0
Embed Size (px)
DESCRIPTION
National Health Research Institutes. Space-Time Modeling and Application to Emerging Infectious Diseases. 李正宇. July 26th, 2005. Division of Biostatistics and Bioinformatics. Outline. Introduction STARMA Models Methods for STARMA Modeling and Software IEAST - PowerPoint PPT Presentation
Citation preview
2005/4/12 1
Space-Time Modeling and Application to Emerging Infectious Diseases
李正宇
National Health Research Institutes
Division of Biostatistics and Bioinformatics
July 26th, 2005
2005/4/12
2
Outline
• Introduction• STARMA Models• Methods for STARMA Modeling and
Software IEAST• Modeling Emerging Infectious Diseases
using STARMA and IEAST• Conclusion
2005/4/12
3
Introduction
2005/4/12
4
Introduction
Tobler’s First Law of Geography
‘‘Everything is related to everything else, but near things are more related than distant things.’’
2005/4/12
5
Introduction
• Biological and ecological processes are often organized and correlated in both space and time.
• Why use space-time data and space-time analyses?
• Various space-time models– STKF, KKF, VARMA, STARMA, etc.
• Why STARMA models?
• Is emerging infectious diseases the only application?
2005/4/12
6
Scope of the Work
• An efficient and robust STARMA modeling method– Space-time extensions of optimization algorithm and model fitness
measures– Refinement of the space-time modeling procedure
• Software development -- IEAST– The first general-purpose STARMA modeling and analysis software– Integrated Environment for Analyzing STARMA models
• Application to the spread of WNV in an epidemic in Detroit– Modeling and analysis of Dead Crow Data– Modeling and analysis of Human Case Data– Cross analysis of Human Case Data and Dead Crow Data– Statistical inferences from these space-time analyses
2005/4/12
7
STARMA Models
2005/4/12
8
Space-Time Variables Evolving over Time
• zt,x : some ecological variable at spatial coordinates vector x at time t. zx forms a time series for location x.
• These time series are not independent, but influence each other via spatial proximity.
time = t-1
time = t
timeXY
random noisezt,(2,2)
zt,(1,2) zt,(2,1)
zt,(0,0)
2005/4/12
9
General STARMA Models
• The general STARMA model has the stochastic equation:
• Model types:– STAR model (when k,b=0)
– STMA model (when k,b=0)
– Mixed model (when k,b 0 and k,b 0).
|----- AR terms -----| |----- MA terms -----|The strengths of the autoregressive components is measured by k,b and the strengths of shared moving average stochastic inputs are k,b.
k k
tktkktkt eezzb b
xbxbbxbx ,,,,,,
2005/4/12
10
A Useful Form for STARMA Modeling
• By introducing the spatial weight matrices W(l), we can express the general STARMA model as the following form:
• This is the equation actually used for the implementation of IEAST and applications.
where l : spatial lag, k : temporal lag;zt is the observation vector at time t;W(l) is the weight matrix for l-th order;kl are the parameters of autoregressive terms;kl are the parameters of moving average terms;et is the random noise vector at time t.
k l k l
tktl
klktl
klt eeWzWz )()(
2005/4/12
11
Spatial Correlation Structure and Weight Matrices
• Spatial weight matrices are used to construct the spatial correlation structure among locations.
• The following ordering is an example of the definition of spatial correlation structure (up to 4th order neighbors) in 2D system.
1st order, W(1) 3rd order, W(3)2nd order, W(2) 4th order, W(4)
2005/4/12
12
Some Limitations of STARMA Modeling
• Raster based• Requires massive amount of space-time data• Models generally may not be fully mechanistic
Assumptions:• Stationarity• “Spatial Regularity”• Effects are “constant”• Effects are “linearly” correlated
2005/4/12
13
Methods for STARMA Modeling and Software IEAST
2005/4/12
14
Box-Jenkins Modeling Method
ModelIdentification
ParameterEstimation
Diagnostic Check
Good?No
End
YesM
odify Model
Data
2005/4/12
15
Model Identification
To determine the model type and orders.
• Conventionally, space-time autocorrelations (i.e. STACF/STPACF) are used (Pfeifer and Deutsch, 1980).
• In this research, space-time extensions of model fitness measures (i.e. AIC, BIC) are used to assist identification when the method above does not work. These measures are more objective and computationally efficient.
2005/4/12
16
Model Identification—using Space-Time Autocorrelation Functions
Example 1: STAR (MaxT=2, MaxS=1)– STACF tails-off– STPACF cuts-off at T-lag=2 & S-lag=1
Example 2: STMA (MaxT=1, MaxS=1)– STACF cuts-off at T-lag=1 & S-lag=1– STPACF tails-off
STACF STPACF
STPACFSTACF
STACF STPACF
Suggested Model Type
Tail-off Cut-off STAR model
Cut-off Tail-off STMA model
Tail-off Tail-off Mixed model
2005/4/12
17
Model Identification— using Space-Time Autocorrelation Functions
Simulation Data 1Based on a
STAR process
STACF
Cut-offTail-off
STPACF
zt = 0.50zt-1 + 0.30W(1) zt-1 + 0.10zt-2 + 0.05W(1) zt-2 + et
Cut-off
Tail-off
STACF STPACF
Simulation Data 2Based on a
STMA processzt = et -(-0.6)et-1 -(-0.4)W(1) +et-1
2005/4/12
18
Model Identification— using Model Fitness Measures
90% 6% 8%Mixed
90% 6% 4%STMA
96% 0% 4%STAR
MixedSTMASTAR
Model type identifiedDatasets based on
Using Variance of Residuals
90% 2% 8%Mixed
90% 6% 4%STMA
84% 0%16%STAR
MixedSTMASTAR
Model type identifiedDatasets based on
Using AIC
66% 16% 18%Mixed
10% 86% 4%STMA
0% 0%100%STAR
MixedSTMASTAR
Model type identifiedDatasets based on
Using BIC
80% 4% 16%Mixed
18%78% 4%STMA
0% 0%100%STAR
MixedSTMASTAR
Model type identifiedDatasets based on
Using -AIC*BIC
Accuracies (number in red) of model type selection using (1)Variance of residuals, (2)AIC, (3)BIC, and (4)–AIC*BIC based on 150 Monte Carlo simulated datasets:
2005/4/12
19
Parameter Estimation
To calculate coefficients of a candidate model for given model type and orders.
• Two methods needed for two kinds of models:– Linear models (i.e. STAR) : Linear ML estimator.– Non-linear models (i.e. STMA and Mixed) : Multi-variate nonlinear
optimization.
• The multi-variate and non-linear nature raises problems while in optimization :– Converge to local optima– Very time-consuming
• A good starting point is crucial for optimization– Extra step ‘Pre-estimation’– Space-time extended Hannan-Rissanen Algorithm is used.
2005/4/12
20
Diagnostic Check
• To decide the adequacy of a candidate model for representing the given data.
• Methods:– Variance of residuals– Space-time autocorrelations of residuals– Significance testing of parameters– Space-time extension of AIC/BIC
2005/4/12
21
Modeling Procedures
ModelIdentification
ParameterEstimation
Diagnostic Check
Good?No
End
Yes
Modify M
odel
Data
Box-Jenkins method
2005/4/12
22
Software for STARMA Modeling -- IEAST
• Developed using GNU Octave v2.1.40 and able to be used under various popular OS, e.g. MS Windows, Mac OS, Unix.
• Two interfaces: menu-driven mode and programming mode.• Features:
– True spatio-temporal analysis software– Analyzing 2D lattice space-time datasets– Full configurability– Programming environment– Improved estimation algorithms– Improved diagnostic measures– Estimation of spatial correlation structure– Cross correlation analysis– 2D/3D plotting abilities
2005/4/12
23
IEAST —Menu-Driven Mode vs Programming Mode
[IEAST v1.30.01 - STARMA Modeling & Analysis]=============== [ Main Menu ] =============== [ 1] Setup [ 2] Data Preprocessing [ 3] Correlation Analyses [ 4] Model Identification [ 5] Parameter Estimation [ 6] Diagnostic Analysis [ 7] ------ [ 8] Preference [ 9] Interpreter [10] Exit =============================================
============== [ Setup ] ============== [ 1] > Space-time dataset [ 2] > Spatial correlation structure [ 3] > Information of datasets [ 4] > Return=======================================
========= [ Data Preprocessing ] ========= [ 1] > Remove Mean [ 2] > De-seasonalize: (1-B^dd)Z(t) [ 3] > Diference by one: (1-B)Z(t) [ 4] > De-trend [ 5] > ------ [ 6] > Subsequencing/Resampling [ 7] > Smoothing [ 8] > Missing Data [ 9] > Filter with a given STARMA model [10] > Undo previous action [11] > Return==========================================
========== [ Correlation Analyses ] ========== [ 1] > AutoCorrelation (STACF) [ 2] > Partial AutoCorrelation (STPACF) [ 3] > Cross Correlation (STXCF) [ 4] > Partial Cross Correlation (STPXCF) [ 5] > Extended Cross Correlation (ExtSTXCF) [ 6] > Plot Correlations versus T-Lag/S-Lag [ 7] > Return==============================================
==============================================[ Model Identification ] [ 1] Automatic Identification (Type,Orders) [ 2] Artificial Identification (Type,Orders) [ 3] Parameter Masking [ 4] ------ [ 5] Return==============================================
=================== [ Parameter Estimation ] =================== [ 1] > Pre-estimate Model Param -- Linear (STAR) [ 2] > Pre-estimate Model Param -- Non-linear (STMA,STARMA) [ 3] > Pre-estimate Model Param -- From STACF/STPACF [ 4] > Pre-estimate Model Param -- Specified by users [ 5] > Estimate Model Param -- Fixed SRM [ 6] > Estimate SRM -- Fixed Model Param [ 7] > Estimate SRM & Model Param -- Alternatively [ 8] > Return================================================================
==== [ Diagnostic Analysis ] ==== [ 1] > Statistical Significance [ 2] > AICC/BIC Analysis [ 3] > STACF of Residuals [ 4] > STPACF of Residuals [ 5] > ------ [ 6] > Return=================================
# list10 load data demo.dat20 load weight uniform.wet30 stacf ST_ACF Z 16 340 plotacf ST_ACF 16 3 "ACF" : : :
In menu-driven mode, users can conduct the modeling procedure by selecting a series of commands/options from the menu hierarchy.
2005/4/12
24
IEAST —Menu-Driven Mode vs Programming Mode
[IEAST v1.30.01 - STARMA Modeling & Analysis]=============== [ Main Menu ] ============= [ 1] Setup : : [ 8] Preference [ 9] Interpreter [10] Exit =============================================
=============================================|| Welcome to STARMA analyzing interpreter ||=============================================
# load program demo.pgm# list10 load data demo.dat20 load weight uniform.wet30 stacf STACF Z 16 3………100 end# run
10 load data demo.dat20 load weight uniform.wet30 stacf STACF Z 16 3…….
IEAST Program ‘demo.pgm’
# name: DatafileZ # type: matrix # rows: 100 # columns: 100 -0.0350001 0.00197952 -0.00635348.... -0.0886448 0.0504684 -0.00369402....0.025101 0.00844576 -0.00743455....…………………..
Space-time Dataset: ‘demo.dat’
# name: SOD # type: global matrix # rows: 21 # columns: 21 0 0 0 0 0 0 0 0…. 0 0 0 0 0 0 0 0…. ……………….
Spatial Weighting Matrices: ’uniform.wet’
In programming mode, a set of sophisticated instructions can be used to compose programs to control the modeling flow and to conduct statistical analyses.
2005/4/12
25
Modeling Emerging Infectious Diseases using STARMA and IEAST
2005/4/12
26
State of Art for Statistical Analyses of Emerging Infectious Diseases
As far as we know, no true spatial-temporal statistical models and methods have been used.
• Space-time cluster analysis available (Theophilides et al, 2003; Mostashari et al, 2003; Hoebe et al, 2004)
• Spatial models available (Watson et al, 2004).• Temporal models available.
2005/4/12
27
Limitations of Simply Observing How a Spatial Distribution Changes over Time
• For example, expansion of the leading edge of a disease range.
• Is the disease spreading directly over long distances but infrequently, or over short distances frequently?
• This is important for projecting the future spread.
2005/4/12
28
STARMA Has Potential for the Early Characterization of Infectious Diseases.
• STARMA acts as a “prism”. Can filter the spatial-temporal correlations into direct effects with known magnitude and spatial and temporal lags.
• Not generally a complete, mechanistic model, but puts critical constraints on models.
2005/4/12
29
West Nile Virus
The West Nile Virus (WNV) was first detected in a woman with a mild fever in the West Nile District of Uganda in 1937. Since then WNV has been spreading to North Africa, Europe, West and Central Asia, and the Middle East.
2005/4/12
30(A figure from CDC web site)
West Nile Virus in the United States
• Outbreak in NYC in Sep 1999. Vector is Culex mosquitoes.• Wild birds (89% are American crows) are the principal
hosts. Humans, horses, etc. are incidental hosts.• The incidence rate among crows is high. Infected crow
almost always die (68%).
• Surveillance of Dead crows has been used as an indicator of WNV epidemic.
2005/4/12
31
Dead Crow Data (DCD) & Human Case Datasets (HCD) in 2002
Time: Summer in 2002 (April~October)Place: Detroit metro area (Oakland, Macomb, and Wayne)
• DCD were collected systematically before and during an outbreak among humans. Data mainly consisted of locations and dates of reported public sightings.
• HCD were obtained from clinicians in Michigan. Data on address of residence and date of onset of disease were obtained from the case-patient or attending physician through telephone interviews.
2005/4/12
32
Two Datasets Collected in 2002
WWWpages
GIS - ArcMap
Longitude/Latitude
Toll-free #
Human Cases
Dead Crows*
Interview
Data Cleaning & Geocoding
* From www.rci.rutgers.edu/ ~insects/crowid.htm
2005/4/12
33
Space-Time Analysis for Dead Crow Data
2005/4/12
34
The Dead Crow Data
• Totally, 1817 dead crow sightings scattered within the three counties (red lines), spanning 28 weeks.
• Covered area (after truncation): a rectangular area of 31.6x25.8 mi• Divide the covered area into 10x10 cells. Cell size: 3.16x2.58mi
2005/4/12
35
Spatial Correlation Structure and Trends
Spatial correlation structure (uniform weighting)
Preprocessing– Remove spatio-temporal trend
• Spatial trend: 4th order polynomial regression trend surface
• Temporal trend: averaging over space.
– Remove mean
*65556*
6543456
5421245
5310135
5421245
6543456
*65556*
2005/4/12
36
Tail-off
Model Identification — STACF
STACF tails-off
2005/4/12
37
Model Identification — STPACF
The STACF/STPACF suggest the model –STAR(maxT=3, maxS=4).
Temporallycut-off after
this lag
Spatiallycut-off after
this lag
2005/4/12
38
Parameter Estimation
The parameters (ts) of this STAR model can be estimated in IEAST by linear maximum likelihood estimator.
04.002.002.011.002.03
11.004.007.018.004.02
04.009.010.036.026.01
43210
t
t
t
sssssts
• Values in dark blue are nominally significant at the 0.001 level.• Values in light blue are nominally significant at the 0.01 level.
2005/4/12
39
Diagnostic Check
• Statistical significance of parameters– The probabilities P that ts are not significant are:
• Residual’s autocorrelations
6.09.025.001.04.03
01.03.004.0001.004.02
4.01.001.0001.0001.01
43210
t
t
t
sssssP
STACF STPACF
2005/4/12
40
Interpretations for the DCD Analysis
• STAR(3,4) model is the best-fitted one. • The max. of spatial and temporal lags that are important
are still smaller. S=2 (or 6.4 km) and T=2 weeks.• Compare S=1 to S=2. Value for S=1 is much larger—cell
boundary length effects.• The virus is not spreading very far very fast. Crows are not
much spreading the virus spatially, though they probably are amplifying it locally.
• Negative Autoregressive Effect At S=1, and T=2,3.
04.002.002.011.002.03
11.004.007.018.004.02
04.009.010.036.026.01
43210
t
t
t
sssssts– Appears to be a real effect.– May be due to crow population depletion.– Suggests there is a mixture of two STAR
processes, the dominant one reflecting probability of infection, the other an echo effect from depletion.
2005/4/12
41
Additional Analyses and Results
Additional Analyses:• Using 20x20 and other cell configurations• Using different lag structures “Pfeiffer’s” vs. “Ring structure”• Using various polynomials for Spatial de-trending• Using sub-sample of the data
Results:• Consistent over various methods of spatial de-trending, except
high order polynomials resulted in smaller AR.• Consistent AR values using different lag structures and cell sizes.• Consistent implied spatial and temporal scales over which there
are significant or substantial AR effects
2005/4/12
42
Distances for Which There Are Significant Spatial Correlation
• Based on different cell configurations: 10x10, 16x16, and 20x20– The effective correlated area in the modeling result is consistently
about 10.75 km regardless of cell sizes.
Configurations Cell sizes Max S order of the estimated model
Equivalent distances
10 x10 5.08x4.15km 4 10.99 km
16 x 16 3.19x2.59km 6 10.88 km
20 x 20 2.54x2.08km 7 10.38 km
2005/4/12
43
Alternative Spatial Correlation Structures
3333333
3222223
3211123
3210123
3211123
3222223
3333333
Ring structure
*65556*
6543456
5421245
5310135
5421245
6543456
*65556*
Pfeifer’s
2005/4/12
44
Space-Time Analysis for Human Case Data
2005/4/12
45
Human Case Data
• Over 500 human cases spanning 13 weeks
• Date of onset-converted to week
• Home addresses (names stripped)-converted to “cell,” same as for DCD.
• Used same arrays of cell sizes and spatial correlation structures as for DCD.
• Same spatial and temporal de-trending method
2005/4/12
46
Model Identification — STACF
2005/4/12
47
Model Identification — STPACF
2005/4/12
48
Parameter Estimation
s=0 s=1 s=2 s=3 s=4 s=5 s=6
t=1 0.26 0.06 -0.10 -0.29 -0.05 -0.30 -0.60
t=2 0.12 0.27 0.13 -0.12 -0.11 -0.22 -0.11
t=3 0.07 0.10 -0.15 0.05 0.00 0.06 -0.01
t=4 0.04 -0.17 -0.07 -0.02 0.16 0.25 0.11
t=5 -0.01 -0.10 -0.04 0.10 -0.06 0.11 0.06
t=6 -0.04 0.08 0.09 0.03 -0.03 -0.19 -0.09
• Values in dark blue are nominally significant at the 0.001 level.• Values in light blue are nominally significant at the 0.01 level.
Spatial lagsT
empo
ral l
ags
(wee
ks)
2005/4/12
49
Diagnostic Check
• Residual’s STACF and STPACF
STACF STPACF
2005/4/12
50
Interpretations for the HCD Analysis
• Most people are getting infected at or near their homes.• The incidences are highly autocorrelated in space and time.• The distribution or probability of infection is highly “localized”.• The WNV “load” and probability of human infection is “spreading”
slowly, in the sense of not spreading very far very fast.• Suggests localized spraying could reduce cases. • Without depletion effect, the human case data show positive and
significant above zero for T-lag=2 and S-lag>=1, esp. at S-lag=1.
s=0 s=1 s=2
t=1 0.26 0.06 -0.10
t=2 0.12 0.27 0.13
t=3 0.07 0.10 -0.15
2005/4/12
51
Space-Time Cross Analysis for HCD and DCD
2005/4/12
52
Space-Time Data HCD and DCD
• The areas for cross analysis are same for both datasets.• The configuration is again 10x10 and spanning 28 weeks.• Cell size is 6.31x6.31 km.
2005/4/12
53
Both Temporal Epidemic Curves
Dead crow reported is leading human cases in time.
2005/4/12
54
Space-Time Cross Correlations
-3
2005/4/12
55
Interpretations for Space-Time Cross Correlations
• Drop smoothly to zero spatially and temporally.• Very large (as high as 0.7).• Across all spatial lags, the max. cross correlations are aligned at –3
weeks. • The cross correlations at spatial lag 1 is slightly greater than at
spatial lag 0. • When temporal lag decreases to –8 or below, the correlations
between these two datasets are negligible (<0.1).• When spatial lag increases up to 10, the cross correlations are
reduced to as low as 0.2.
2005/4/12
56
Is the Cross Correlations Spurious?
• The result shows that the ‘real’ cross correlations are much larger than the ‘spurious’ components.
The autocorrelation of the DCD can spuriously contribute to cross correlations. To eliminate this effect, both datasets were pre-whitened before calculating cross correlations.
Cross correlation with pre-whitening
2005/4/12
57
Summary for Modeling the Spread of WNV
• Crows are not spreading the disease spatially very far very fast.
• Spread is very localized, perhaps other animals or the mosquitoes themselves are spreading it spatially.
• Humans are being infected largely at or near their homes.• Both crows and humans appear to be responding to local
viral loads.• Dead crow findings precede human cases by two to three
weeks. Dead crows can be a good indicator of human epidemics.
2005/4/12
58
Conclusion
• It appears that STARMA modeling could be an important tool of the early characterization of many emerging and re-emerging infectious disease epidemics.
• During the course of an epidemic, it could be used (in principle) for forecasting, under existing conditions or under potential courses of action.
• While not generally a mechanistic model, STARMA does inform spatial and temporal scales of spread, hence places constraints on mechanistic models (which otherwise may have too many parameters).
2005/4/12
59
Funding Acknowledgements
• Michigan Agricultural Experiment Station, Michigan State University.
• Center for Emerging Infectious Diseases, Michigan State University.
• Centers for Disease Control and Prevention, USA.
2005/4/12
60
Thanks for your attention!
& Questions?
2005/4/12
61
References
• C.J.P.A. Hoebe, H. de Melker, L. Spanjaard, J. Dankert, and N. Nagelkerke. Space-time cluster analysis of invasive meningococcal disease, Emerging Infectious Disease, Vol.10, No. 9, p1621-1626, 2004.
• C.N. Theophilides, S.C. Ahearn, S. Grady, and M. Merlino. Identifying West Nile virus risk areas: The dynamic continuous-area space-time system. American Journal of Epidemiology, 157:843-854, 2003.
• J. Watson, R. Jones, K. Gibbs, and W. Paul. Dead crow reports and location of human West Nile virus cases, Chicago, 2002. Emerging Infectious Diseases, 10(5):938-940, 2004.
• F. Mostashari, M. Kulldorff, J.J. Hartman, J.R. Miller, V. Kulasekera. Dead bird clustering: A potential early warning system for West Nile virus activity. Emerging Infectious Diseases, 9:641-646, 2003.