View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Bayesian space-time models for surveillance and policy evaluation
using small area data
Nicky Best Department of Epidemiology and Biostatistics
Imperial College, London
Joint work with Guangquan (Philip) Li, Sylvia Richardson, Bob Haining, Anna Hansell,
Mireille Toledano, Lea Fortunato
Outline
Introduction
Policy Evaluation: Evaluating Cambridgeshire Constabulary’s ‘no cold calling’ initiative
Surveillance: Detecting unusual trends in chronic disease rates
Introduction
Bayesian space-time modelling of small-area data is now common in many application areas disease mapping small area estimation (official statistics) mapping crime rates modelling population change .....
Key feature is that data are sparse Bayesian hierarchical model allows smoothing over
space and time → improved inference
Introduction
Many different inferential goals description prediction surveillance estimation of change / policy impact .....
Many different ways of formulating the space-time model space + time (separable effects) space + time + interaction space-time mixture models .....
Our set-up Inferential goals: detection of areas with ‘unusual’ time trends
Goal 1: Policy evaluation a policy or intervention has been implemented in a known subset of
areas, and we wish to evaluate whether this has had a measureable impact on the event rate in those areas
Goal 2: Surveillance no a priori subset of areas of interest; we just wish to identify any
areas whose event rate differs markedly from the general time trend
General modelling framework Assume most areas exhibit a common temporal trend (separable
space and time effects) – the ‘common trend’ model For a small subset of areas, assume time trend is unusual (space-time
interaction) – the ‘local trend’ model
Goal 1: Policy Evaluation
Evaluating Cambridgeshire Constabulary’s ‘No Cold Calling’ initiative
In collaboration with
Guangquan Li*, Robert Haining+, Sylvia Richardson+University of Cambridge
*Imperial College, London
Definition of a “cold call” A visit or a telephone call to a consumer by a trader, whether
or not the trader supplies goods or services, which takes place without the consumer expressly requesting the contact.
Not illegal but often associated with forms of burglary and “rogue trading”.
To discourage cold calling police have targeted specific neighbourhoods as “no cold calling” (NCC) areas: street and house signage; information packs for residents; informal follow-up meetings.
Cambridgeshire Constabulary initiated NCC scheme in parts of Peterborough in 2005 and extended it in 2006.
Locations of the NCC areas in Peterborough
Summary of NCC-targeted areas
Data for evaluation All reported “burglary in a dwelling” events (Home Office
classification code 18, sub-codes 0-10, and code 29) used as outcome Surrogate for rouge trading and distraction burglary (very small
number or recorded events)
Data aggregated to annual counts by Census Output Area (COA) in Peterborough
Time period: 2001-2008
Total of 9388 recorded burglaries
Median burglaries per area per year = 2
5th and 95th percentiles: 0 – 8
Raw data: individual and aggregated time trends
Positive impact of policy?
Poisson test
RR01-04 = 1.06, p=0.56
RR05-08 = 0.85, p=0.19
Strategy for evaluation Compare burglary rates before and after implementation of NCC
scheme difference between 2 time periods is indicative of impact of policy
Comparison is done after adjustment for systematic changes in burglary rate in other non-NCC areas use of ‘control’ areas helps to differentiate how much of the change is
due to the policy and how much to other external factors Deal with sparsity of the data (i.e. small number of burglary
events) by Data aggregation → assessing overall impact Hierarchical modelling of local impacts → assessing both overall and
local impacts
→ Separate signal from noise
Control Criterion
Description No. of LSOAs
1 All LSOAs in Peterborough 88
2 ±10% burglary rate of the NCC group in 2005 9
3 ±20% burglary rate of the NCC group in 2005 20
4 ±30% burglary rate of the NCC group in both 2004 and 2005 7
5 LSOAs containing the NCC-targeted COAs (but excluding the NCC-targeted COAs)
10
6 LSOAs that had “similar” multiple deprivation scores (MDS) as those for the NCC LSOAs in 2004
46
Constructing the control group Control areas are selected to have similar local characteristics (e.g.
burglary rates; deprivation scores) to those in the NCC-targeted group Control areas are chosen to be Lower Super Output Areas (LSOA) to
obtain reliable control data (results are similar with COA-level controls)
Evaluation procedure
Evaluation procedure
Evaluation procedure
The impact function We consider various functional forms for the impact function
(Box and Tiao, 1975)
The impact of the policy is quantified through the estimation of the function parameter(s)
Model selection via DICName Functional form
No change
Step change
A linear function of time
A generalization function
Full model specification
21: 1
2
2
~ ( )
log( )
~ (0,1000)
~ ( , )
~ (0, )
~ (0, )
Poisson
N (overall intercept)
RW (time effect)
N (area effect)
N (overdispersion)
it i it
it i t it
T
i u
it
y n
u
u
W
* * * *
0
*
*
*
* 2
0
2
~ ( )
log( )
( ) ( , )
(0, )
( , ) ( 1)
~ ( , )
Poisson
N
N
kt k kt
kt k t kt
k
t t
k i k
kt
k k
k b b
y n
u
I t t f t b
u u
f t b b t t
b
Control areas + NCC areas pre-scheme
NCC areas post-scheme(t ≥t0)
yit
eituigta
sg2
se2 su
2
Implementation
Common trend model
Model fitted in WinBUGS
Common trend model fitted to control areas (all years) plus NCC areas (years before scheme only)
yit
eituigta
sg2
se2 su
2
uk*ekt
*gt
*a *
yktbk
Implementation
Common trend model
Local trend model , t ≥ t0
Model fitted in WinBUGS
Common trend model fitted to control areas (all years) plus NCC areas (years before scheme only)
Local trend model (impact function) fitted to NCC areas (years after scheme)
for k=i
mb
sb2
se2*
yit
eituigta
sg2
se2 su
2
uk*ekt
*gt
*a *
yktbk
Implementation
Common trend model
Local trend model, t ≥ t0
Model fitted in WinBUGS
Common trend model fitted to control areas (all years) plus NCC areas (years before scheme only)
Local trend model (impact function) fitted to NCC areas (years after scheme)
‘Cut’ function used to prevent NCC area (post-scheme) data influencing estimation of common trend model parameters
for k=i
mb
sb2
se2*
‘cut’ link
* distributional constant
(no learning)
Results: choice of impact function
Linear impact function has smallest DIC
No Change Step Linear Generalization function
Dbar 15.27 14.32 9.77 11.75
pD 1.21 2.29 2.25 2.57
DIC 16.49 16.61 12.02 14.33
Posterior probability of “success”
i.e. Pr(bk < 0)
No change
Heterogeneity of local impacts
f (t, bk) = bk∙(t - t0+1); bk = a + b xk + dk ; dk ~ N(0, s2) Some of the variability in local NCC impacts may be due to coverage
The larger the proportion of properties that were visited in a COA, the greater the impact of the NCC scheme
b = -1.1
95% CI(-2.6, 0.2)
Heterogeneity of local impactsTwo possible explanations for coverage effect A “threshold” effect
NCC scheme does not have a measurable impact (in terms of reducing burglary rates) unless a sufficient number of households in the local area are visited
A “dilution” effect Because the COA is the unit of analysis, the NCC scheme impact
could be diluted when the households that are visited are only a small proportion of the total households in the COA
Neither of these explanations for the coverage effect undermines our overall assessment of the policy’s success
Conclusions: NCC scheme NCC scheme led to overall “success”
Overall, NCC-targeted areas experienced a 16% (95% CI: -2% to 34%) reduction in burglary rate per year
This suggests a positive impact of the NCC policy which had the effect of stabilizing burglary rate in the targeted areas while overall burglary rates were going up
Linear impact function is better at describing the data than the other 3, suggesting a gradual and persistent change
There exist different impacts between targeted COAs, perhaps due to local differences in implementing the schemes
Assessing NCC impact for whole of Cambridgeshire The NCC scheme was extended to the whole of
Cambridgeshire for the period 2005-08
We applied our evaluation model to assess impact of NCC scheme separately for urban and rural areas
Overall, schemes in urban areas were more successful than those in rural areas.
29
Urban Rural
% change in burglary rates after 1st year of NCC scheme
Overall (0.96) Overall (0.38)
No change No change
Conclusions: Model Hierarchical model allows borrowing of strength across NCC
areas
enables evaluation of local impacts even when data are sparse
Joint estimation of common trend and local trend models enables full propagation of uncertainty
Parameters of common trend model treated as ‘distributional constants’ in local trend model
Facilitated using ‘cut’ function in WinBUGS
More complex impact functions could be implemented, but need sufficient time points post-policy for reliable estimation
Goal 2: Surveillance
Detecting unusual trends in chronic disease rates
In collaboration with
Guangquan Li, Sylvia Richardson, Anna Hansell, Mireille Toledano, Lea Fortunato
Imperial College, London
Surveillance of small area data For many areas of application, such as small area estimates
of income, unemployment, crime rates and rates of chronic diseases, smooth time changes are expected in most areas
However, policy makers and researchers are often interested in identifying areas that ‘buck’ the national trend and exhibit unusual temporal patterns
These abrupt changes may be due to emergence of localised predictors/risk factors(s) or the impact of a new policy or intervention
Detection of areas with “unusual” temporal patterns is therefore important as a screening tool for further investigations
Motivating example 1: COPD mortality Chronic Obstructive Pulmonary Disease (COPD) is a
common chronic condition characterized by slowly progressive and irreversible decline in lung function responsible for approximately 5% of deaths in the UK
Main risk factors include Smoking
Occupational exposure to high levels of dusts and fumes
Outdoor air pollution
“Umbrella” term for broad range of disease phenotypes
Time trends may reflect variation in risk factors and also variation in diagnostic practice/definitions
Motivating example 1: COPD mortality
Objective 1: Retrospective surveillance to highlight areas with a potential need for further investigation and/or
intervention (e.g. additional resource allocation) Objective 2: Policy assessment
Industrial Injuries Disablement Benefit was made available for miners developing COPD from 1992 onwards in the UK
As miners with other respiratory problems with similar symptoms (e.g., asthma) could potentially have benefited from this scheme, there was debate on whether this policy may have differentially increased the likelihood of a COPD diagnosis in mining areas
Data Observed and age-
standardized expected annual counts of COPD deaths in males aged 45+ years 374 local authority districts
in England & Wales 8 years (1990 – 1997)
Difficult to assess departures of the local temporal patterns by eye Need methods to
quantify the difference between the common trend pattern and the local trend patterns
express uncertainty about the detection outcomes
Bayesian Space-Time Detection: BaySTDetect
BaySTDetect (Li et al 2011) is a novel detection method for short
time series of small area data using Bayesian model choice
between two competing space-time models Model 1 assumes space-time separablility for all areas → one common
temporal pattern across the whole study region Model 2 provides local time trend estimates for each spatial unit individually
For each area, a model indicator is introduced to decide whether
Model 1 or Model 2 is supported by the data → Quantifying the difference
A Bayesian procedure of controlling the false discovery rate is
employed → Expressing uncertainty about detected areas
BaySTDetect: modelling framework
The temporal trend pattern is the same
for all areas
Temporal trends are independently estimated
for each area.2
log( )
~ (0,1000)
~
,
(area-specific in
N
random walk (R
model
W[
2 for
tercept)
(area-specific temporal tren
al
])
l
d)
it i it
i
it i
tu
u
i
2
log( )
~
~
,
(common spatial pattern)
spatial BYM model
random walk (R (common temporal trenW[ ]) mode
model 1 for
l
l
d)
a lit i t
i
t
i t
~ ( )it it ity E Poisson
Model selection A model indicator zi indicates for each area whether
Model 1 (zi =1) or Model 2 (zi =0) is supported by the data
ImplementationModel 1: Common trend
yit
mit[C]
hi gt
Eit
Model 2: Local trend
yit
mit[L]
ui fit
Eit
yit
mit
Eit
[ ] [ ](1 )C Lit i it i itz z
Selection modelzi
Prior on model indicator
For the model indicator zi, we have
~ 0.95iz Bernoulli( ) where
This prior on zi
reflects the surveillance nature of the analysis where we expect to find only a small number of unusual areas a priori
ensures that a common trend can be meaningfully defined and estimated
Classifiying areas as “unusual” Classification of areas as “unusual” is based on the posterior
model probabilities pi = Pr(zi | data)
Small values of pi indicate low probability that area i fits the common trend → high probability of being “unusual”
Need a rule for calibrating the pi that acknowledges the multiple testing setting How low does pi need to be in order to declare area i as
unusual? False Discovery Rate (FDR) is the proportion of detected areas
that are false (i.e. not truly unusual) (Benjamini & Hochberg, 1995)
Various methods to estimate or control FDR Here we control the posterior expected FDR (Newton et al 2004)
Detection rule based on FDR control
First rank the areas according to increasing values of pi
At a nominal FDR level of a, the first k ranked areas are declared as unusual where k is the maximum integer satisfying
where p(j) is the jth ranked posterior common-trend model
probability
This procedure ensures that (posterior) expected number of false positives is no more than (k ×a) of the k declared unusual areas
( )1
k
jj
p k
Simulation study to evaluate operating characteristics of BaySTDetect
Simulated data were based on the observed COPD mortality data
Three departure patterns were considered When simulating the data, either the original set of
expected counts from the COPD data or a reduced set (multiplying the original by 1/5) were used
15 areas (approx. 4%) were chosen to have the unusual trend patterns areas were chosen to cover a wide range expected
count values and overall spatial risks Results were compared to those from the popular
SaTScan space-time scan statistic
Simulation Study: Departure patterns
Common trend, exp(gt)
Departure pattern, exp(gt ∙q)
2 different departure magnitudes: q =1.5 and q =2.0
Simulation Study: expected counts
Table: Summary of the original set of age-adjusted expected counts used in the simulation
Simulation Study: FDR controlEmpirical FDR vs corresponding pre-defined level: Pattern 2
SaTScan: Empirical FDR = 0.19 (0.00 to 0.78) for scenario with original expected counts and q =2.0
0.05 0.10 0.15 0.20
Pre-set FDR level0.05 0.10 0.15 0.20
Pre-set FDR level0.05 0.10 0.15 0.20
Pre-set FDR level
Em
piric
al F
DR
0.0
0.2
0.4
0.6
0.8
1.0
Em
piric
al F
DR
0.0
0.2
0.4
0.6
0.8
1.0
Em
piric
al F
DR
0.0
0.2
0.4
0.6
0.8
1.0
Original expected;
q=1.5
Original expected;
q=2.0
Reduced expected;
q=2.0
mean
95% sampling interval
Sensitivity of detecting the 15 truly unusual areas
E=24 E=33 E=42 E=52 E=80 Expected count quantiles
E=24 E=33 E=42 E=52 E=80 Expected count quantiles
E=24 E=33 E=42 E=52 E=80 Expected count quantiles
E=24 E=33 E=42 E=52 E=80 Expected count quantiles
Sen
sitiv
ity0
.0
0.2
0
.4
0.6
0
.8
1.0
Sen
sitiv
ity0
.0
0.2
0
.4
0.6
0
.8
1.0
Sen
sitiv
ity0
.0
0.2
0
.4
0.6
0
.8
1.0
Sen
sitiv
ity0
.0
0.2
0
.4
0.6
0
.8
1.0
BaySTDetect (FDR=0.1) SaTScan (p=0.05)
True departure
magnitude:q=2.0
True departure
magnitude:q=1.5
Pattern 2
Sensitivity of detecting the 15 truly unusual areas: reduced expected counts
E=5 E=6 E=8 E=11 E=16 Expected count quantiles
Sen
sitiv
ity0
.0
0.2
0
.4
0.6
0
.8
1.0
BaySTDetect (FDR=0.1) SaTScan (p=0.05)
Pattern 2; True departure magnitude: q=2.0
E=5 E=6 E=8 E=11 E=16 Expected count quantiles
Sen
sitiv
ity0
.0
0.2
0
.4
0.6
0
.8
1.0
COPD application: Detected areas (FDR=0.05)
COPD application: Interpretation Results provide little support for hypothesis regarding the
industrial injuries policy only 3 out of 40 ‘mining’ districts detected (Barnsley,
Carmarthenshire and Rotherham) unusual trend patterns in these areas are not consistent
Two unusual districts (Lewisham and Tower Hamlets) with an increasing trend (against a national decreasing trend) were identified in inner London These areas are very deprived, with high in-migration and ethnic
minorities → might expect different trends to rest of country In fact, Tower Hamlets has been commissioning various local
enhanced services to tackle high rates of COPD mortality since 2008.
This rising trend could potentially have been recognised earlier in the 1990s through using BaySTDetect as a surveillance tool.
COPD application: SaTScan
Primary cluster: North (46 districts) – excess risk of 1.05 during 1990-92 Secondary cluster: Wales (19 districts) – excess risk of 1.12 during 1995-96
Example 2: Data mining of cancer registries
The Thames Cancer Registry (TCR) collects data on newly diagnosed cases of cancer in the population of London and South East England
It is one of the largest cancer registries in Europe, covering a population of over 12 million, and holds nearly 3 million cancer registration records.
We perform a retrospective surveillance of time trends for several cancer types using BaySTDetect
aim to provide screening tool to detect of areas with “unusual” temporal patterns
automatically flag-up areas warranting further investigations
Cancer data Cancer incidence for population aged 30+ years
Breast (female only) Colon (males and females combined) Lung (males and females, separately)
South East England, ward level (1899 areas) Period 1981-2008
Data were aggregated by 4-year intervals 7 time periods for the detection analysis
Cancer data summary
Min Q1 Median Mean Q3 Max
breast
OBS 0.0 10.0 16.0 17.6 24.0 69.0EXP 0.0 11.3 16.5 17.6 23.0 56.5
colon
OBS 0.0 5.0 8.0 9.1 12.0 42.0EXP 0.0 5.7 8.5 9.1 11.8 34.6
Female lung
OBS 0.0 3.0 5.0 6.4 9.0 34.0EXP 0.0 4.0 5.9 6.4 8.3 24.5
Male lung
OBS 0.0 6.0 10.0 11.8 16.0 66.0EXP 0.0 7.6 11.2 11.8 15.2 39.5
Comparable to reduced expected count scenario in simulation study
54
Results: Number of detected areas (out of 1899)
Cancer type FDR=0.05 FDR=0.1 FDR=0.15 FDR=0.2
Breast 9 19 35 54
Colon 0 3 5 8
Lung (female) 0 1 2 4
Lung (male) 6 14 24 39
Detected areas: breast cancer
56
Summarising the unusual trends
With a relatively large number of detected areas (e.g., breast and male lung cancer), examination of the individual trends becomes difficult
For the detected areas, the estimated RR trends from the local trend model are fed into a standard hierarchical clustering method (hclust in R)
The cluster-specific trends are then compared to the overall RR trend
log( )model 1
mod
el 2i t
iti itu
Breast cancer
FDR=0.2
Black line = common trend
Coloured lines = average local trend
in each cluster
1 cluster 2 clusters
3 clusters 4 clusters 5 clusters
BaySTDetect: Conclusions and Extensions We have proposed a Bayesian space-time model for retrospective
detection of unusual time trends
Simulation study has shown good performance of the model in
detecting various realistic departures with relatively modest
sample sizes
Possible extensions include:
Spatial prior on zi to allow for clusters of areas with unusual trends
Time-specific model choice indicator zit, to allow longer time series
to be analysed Alternative approaches to calibrating posterior model probabilities,
e.g. decision theoretic approach (Wakefield, 2007; Muller et al.,
2007)
G. Li, R. Haining, S. Richardson and N. Best. Evaluating Neighbourhood Policing using Bayesian Hierarchical Models: No Cold Calling in Peterborough, England. Submitted
G. Li, N. Best, A. Hansell, I. Ahmed, and S. Richardson. BaySTDetect: detecting unusual temporal patterns in small area data via Bayesian model choice. Submitted
G. Li, S. Richardson , L. Fortunato, I. Ahmed, A. Hansell and N. Best. Data mining cancer registries: retrospective surveillance of small area time trends in cancer incidence using BaySTDetect. Proceedings of the International Workshop on Spatial and Spatiotemporal Data Mining, 2011.
www.bias-project.org.uk
Funded by ESRC National Centre for Research Methods
References