13
HAL Id: hal-00902860 https://hal.archives-ouvertes.fr/hal-00902860 Submitted on 1 Jan 2007 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. A spatial hazard model for cluster detection on continuous indicators of disease: application to somatic cell score Emilie Gay, Rachid Senoussi, Jacques Barnouin To cite this version: Emilie Gay, Rachid Senoussi, Jacques Barnouin. A spatial hazard model for cluster detection on continuous indicators of disease: application to somatic cell score. Veterinary Research, BioMed Central, 2007, 38 (4), pp.585-596. 10.1051/vetres:2007018. hal-00902860

A spatial hazard model for cluster detection on continuous

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A spatial hazard model for cluster detection on continuous

HAL Id: hal-00902860https://hal.archives-ouvertes.fr/hal-00902860

Submitted on 1 Jan 2007

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

A spatial hazard model for cluster detection oncontinuous indicators of disease: application to somatic

cell scoreEmilie Gay, Rachid Senoussi, Jacques Barnouin

To cite this version:Emilie Gay, Rachid Senoussi, Jacques Barnouin. A spatial hazard model for cluster detection oncontinuous indicators of disease: application to somatic cell score. Veterinary Research, BioMedCentral, 2007, 38 (4), pp.585-596. �10.1051/vetres:2007018�. �hal-00902860�

Page 2: A spatial hazard model for cluster detection on continuous

Vet. Res. 38 (2007) 585–596 Available online at:c© INRA, EDP Sciences, 2007 www.vetres.orgDOI: 10.1051/vetres:2007018

Original article

A spatial hazard model for cluster detectionon continuous indicators of disease:

application to somatic cell score

Emilie Gaya,b*, Rachid Senoussia, Jacques Barnouinb

a INRA, UR546, Biostatistique et processus spatiaux, Avignon, 84000, Franceb INRA, UR346, Épidémiologie animale, Saint-Genès-Champanelle, 63122, France

(Received 28 August 2006; accepted 12 February 2007)

Abstract – Methods for spatial cluster detection dealing with diseases quantified by continuousvariables are few, whereas several diseases are better approached by continuous indicators. Forexample, subclinical mastitis of the dairy cow is evaluated using a continuous marker of udder in-flammation, the somatic cell score (SCS). Consequently, this study proposed to analyze spatializedrisk and cluster components of herd SCS through a new method based on a spatial hazard model.The dataset included annual SCS for 34 142 French dairy herds for the year 2000, and importantSCS risk factors: mean parity, percentage of winter and spring calvings, and herd size. The modelallowed the simultaneous estimation of the effects of known risk factors and of potential spatialclusters on SCS, and the mapping of the estimated clusters and their range. Mean parity and winterand spring calvings were significantly associated with subclinical mastitis risk. The model with thepresence of 3 clusters was highly significant, and the 3 clusters were attractive, i.e. closeness tocluster center increased the occurrence of high SCS. The three localizations were the following:close to the city of Troyes in the northeast of France; around the city of Limoges in the center-west;and in the southwest close to the city of Tarbes. The semi-parametric method based on spatial haz-ard modeling applies to continuous variables, and takes account of both risk factors and potentialheterogeneity of the background population. This tool allows a quantitative detection but assumes aspatially specified form for clusters.

spatial epidemiology / cluster detection / hazard function /mastitis / dairy herd

1. INTRODUCTION

Spatial aspects of health events areof growing concern in epidemiology.Whether for emerging or endemic disease,regional differences such as heterogeneity

* Corresponding author:[email protected]

of the background population, climatic andlandscape conditions, agricultural activi-ties, local health policy and the occurrenceof peculiar events such as cattle fairs canhave a great influence on disease spreadand control. The tools available to explorespatial patterns range from geostatistics topoint process approaches. Among these,the issue of cluster detection [7, 30] is

Article available at http://www.vetres.org or http://dx.doi.org/10.1051/vetres:2007018

Page 3: A spatial hazard model for cluster detection on continuous

586 E. Gay et al.

of major interest, since targeting possiblecauses for high disease concentration canassist in control and prevention.

The main techniques used in cluster de-tection rely on scan statistics [1, 21, 22].The principle is to compare the ob-served number of cases inside a mov-ing window to the expected number ofcases under some distribution assump-tions (e.g. Poisson, Bernoulli) [17]. Mostof these methods can deal with spe-cific additional information at the indi-vidual level and integrate some covari-ables. Spatial modeling is another wayto explore spatial patterns, and allowsfor quantification of the effects of knowndisease risk factors, and then attemptsto focus on unexplained spatial cluster-ing [20]. Among the several approaches,one approach handles the concept of in-fectious potential, through Susceptible-Infected-Recovered (SIR) models [10],which can be linked to point-patternmethodology [12]. Other approaches usethe classical framework of linear mixedmodels with risk factors as fixed effects,while spatial variations are included as arandom effect [28, 31]. Some last methodsmake the intensity of case events dependon location of cluster centers [19].

Until recently, the methods availabledealt only with binary variables, and clus-ter detection for diseases measured by con-tinuous variables remained an unexploredfield. Nevertheless, several diseases can bebetter approached through continuous bio-logical indicators, when no internationallyrecognized threshold value is available, orwhen the predictive value of the indica-tor is linear, a frequent case for biologicalmarkers [4, 25].

Lately, several researchers tackled clus-ter detection for new types of vari-ables, and especially continuous ones [15].Huang et al. proposed a spatial scan statis-tic with an exponential survival distribu-tion function, and extendable to other dis-tributions like the gamma and log normal.

Besides its potential adaptation to censoredsurvival data, this spatial scan statistic al-lows adjustment for the covariate effects.They used a linear regression model forthe logarithm of the survival data for thispurpose and assumed the error term tofollow an extreme value distribution. Actu-ally, their model reduced to a full paramet-ric proportional hazard model. A formerapproach by Patil and Taillie [24] usedthe notion of upper-level-sets. The ratio ofthe number of cases per expected num-ber of cases was replaced by the ratio ofcontinuous responses per the expected val-ues, possibly adjusted to factors. The newversion1 of the software SaTScanTM al-lows performing cluster detection with thisexponential model, designed for survivaltime data, and with a normal model, de-signed for continuous data2.

In this paper, we chose a different ap-proach to detect clusters of high risk ofbovine subclinical mastitis. The diagno-sis of this disease mainly relies on thedetermination of milk somatic cell score(SCS), a continuous variable internation-ally recognized as a good indicator formastitis control [13]. Risk factors associ-ated with SCS have been widely investi-gated [5, 6, 26], but in these studies theSCS spatial aspects were not taken into ac-count, while SCS typically presents strongspatial variations [11, 23]. Differences innatural resources, farm structure and mar-ket conditions cause different regions ofthe same country to implement differentdairy management systems, and call for theintroduction of a spatial component in SCSdata analysis.

The purpose of this paper was to pro-pose a new method for spatial cluster

1 Kulldorff M., Information Management Ser-vices, Inc. SaTScanTM v7.0: Software for thespatial and space-time scan statistics [on line](2006) http://www.satscan.org/.2 Kulldorff M., SaTScanTM User Guide for ver-sion 7.0 [on line] (2006) http://www.satscan.org/.

Page 4: A spatial hazard model for cluster detection on continuous

A spatial hazard model for cluster detection 587

detection on continuous variables, with anapplication to bovine subclinical mastitis.We quantitatively analyzed the spatializedrisk of SCS, using a spatial hazard model tosimultaneously estimate the effects on SCSof known risk factors and of potential spa-tial clusters.

2. MATERIALS AND METHODS

2.1. Data

The study population consisted of acohort of French Holstein dairy herds en-rolled in Dairy Herd Improvement Asso-ciation (DHIA) in 2000. The dataset in-cluded 34 142 farms with at least 20 cows.

Data concerning mastitis were extractedfrom the national DHIA database, whichcontained monthly data for every healthylactating cow. The outcome variable wasthe annual herd SCS (ASCS), which wascomputed as the arithmetic mean of allmonthly cow SCS values during 2000.ASCS indicated the farm status for sub-clinical mastitis risk. The other variables ofthe dataset were mean parity, percentage ofcalvings during the winter and spring pe-riod, and herd size, which had been recog-nized as herd factors influencing SCS [3].The geographic coordinates of the farmers’addresses were obtained via the FrenchNational Institute of Statistics and Eco-nomic Studies. The statistical unit was theherd-year.

2.2. Statistical analysis

Statistical procedures were conductedusing the software R 2.0.13 (descriptiveanalysis, models and map-making) andSaTScanTM (spatial scan statistic)1.

3 R Development Core Team, R: A languageand environment for statistical computing. RFoundation for Statistical Computing, Vienna,Austria, ISBN 3-900051-07-0, [on line] (2007)http://www.R-project.org.

2.2.1. Descriptive analysis of variablesand spatial patterns

After a descriptive analysis of the stud-ied variables, we used mapping represen-tations to explore the spatial patterns of thedata. The interpolation technique of kernelsmoothing [27] was performed to representASCS intensity. The presence of spatialcorrelation was assessed and quantified us-ing a correlogram, which is the graph ofempirical autocorrelations of SCS valueswith respect to distance between farms [9].

2.2.2. Spatial hazard model

We explored spatial patterns of ASCSfrom the point of view of survival analy-sis [14], considering ASCS values insteadof usual lifetime. We followed the se-quence of the spatial distribution of farmsas ASCS increased. The hazard func-tion r(z) of a non negative random variable,like the probability density function, com-pletely characterized its probability dis-tribution, i.e. r(z) = f (z)

1−F(z) where f isthe probability density function, and F thecumulative distribution function f (t) =

r(t) exp

⎛⎜⎜⎜⎜⎝−t∫

0

r(s)ds

⎞⎟⎟⎟⎟⎠. In our example, r(z)

is defined as the probability that a farmASCS belonged to a small interval [z, z +Δz], given that the ASCS is known to begreater or equal to the value z. The mapof the farms, whose ASCS were greater orequal to a given level z, hereafter calledthe z-level map, described the spatial struc-ture of farms still “at risk” at level z. Tocompare to classical survival analysis, r(z)represented the hazard of occurrence of theASCS value of a farm, i.e. the probabilityof removing from the (z + Δz)-level map afarm present on the z-level map.

The hazard function depended on ob-servable local explanatory variables and onthe presence of potential clusters, accord-ing to the proportional hazard model type.

Page 5: A spatial hazard model for cluster detection on continuous

588 E. Gay et al.

Conditionally to explanatory variables andclusters, the ASCS were independentlydistributed with a spatial hazard function r:

r (z,x,W x) = r0(z) exp

⎛⎜⎜⎜⎜⎜⎜⎝J∑

j=1

β jWxj − φ(γ, x)

⎞⎟⎟⎟⎟⎟⎟⎠(1)

where r0(z) is the underlying hazard func-tion at an ASCS value z, x the spatial coor-dinates of the farms, W x = (W x

1 , ...,WxJ ) the

vector of risk factors specific of the farm atlocation x, β j the coefficient for the jth riskfactor, and φ(γ, x) a potential spatial clus-ter effect specified hereafter. In this spatialhazard function, a unit variation of an ex-planatory variable W j with a positive β j

coefficient would increase the hazard of oc-currence of the ASCS value of a farm atany level z by a factor exp(β j) > 1, and thuswould decrease the occurrence of higherASCS levels. Three explanatory variableswere included in the model, with a regres-sion parameter β, as follows:

– 1 continuous variable for mean parity;– 1 continuous variable for the percentage

of winter and spring calvings;– 1 binary variable for herd size: 0 codes

for the herds with less than 50 cows,and 1 for the herds with 50 or morecows.

The cluster effect φ(γ, x) aimed to takeinto account the spatial aggregation ofthe farms sharing approximately the sameASCS values. Instead of including def-inite spatial zones for clusters, we in-troduced a soft version of such zonesunder the form of a smooth parameter-ized function. Mathematically speaking,one can always approach any point setas a limit of a smooth function by ker-nel techniques. Consequently, we specifiedthe cluster function as a sum of spatialGaussian kernels as follows:

φ(γ, x) =K∑

k=1

αk

2πρ2k

exp

⎛⎜⎜⎜⎜⎝−||x − ck||22ρ2

k

⎞⎟⎟⎟⎟⎠ (2)

where K is a fixed number of clusters de-fined by a set of parameters γ = (α, ρ, c)written as a (K × 4) matrix. The αk pa-rameter is the strength of the cluster k,ρk its positive range, and ck its two geo-graphic center coordinates. If a point wentclose to a cluster, the distance ||x − ck || waslow, the exponential increased to the maxi-mum value 1, so the cluster effect tendedto αk/2πρ2

k. Conversely, if a point wentfar from the cluster, the exponential tendedto 0 and the cluster effect fainted withGaussian rate. If α was positive, the hazardof occurrence of any ASCS value of a farmdecreased by a factor

(exp

(−αk/2πρ2

k

)),

the cluster was “attractive” and increasedthe occurrence of higher ASCS levels. Bycontrast, a negative α meant a “repulsivefocus” and decreased the occurrence ofhigh ASCS levels. The cluster effect isactually a generalized regression modelwhere the response value depends onlyon a vector parameter γ associated to ob-servable covariates, which are the spatialcoordinates of farms. Thus, even if clustersdefined here could be interpreted as hiddenfields or a type of frailty model, they werenot.

Having ordered the farm indices i ac-cording to increasing ASCS values zi, anadapted Cox conditional likelihood for themodel was defined as follows:

L∗ =n∏

i=1

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣exp

(J∑

j=1β jW

xij − φ(γ, xi)

)

∑l�i

exp

(J∑

j=1β jW

xlj − φ(γ, xl)

)⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

(3)In regards to statistical estimation andtest issues, the conditional likelihood L∗asymptotically behaves as a true likeli-hood function under regularity assump-tions. The β coefficients and the vectorγ were then estimated by maximizationof L∗. For sake of simplicity, we had notfurther developed the formula (3) ana-lytically to achieve maximization of the

Page 6: A spatial hazard model for cluster detection on continuous

A spatial hazard model for cluster detection 589

conditional likelihood. We used the func-tion “mle” of “stats4” package (part of theR base source) which relies on robust andwell known approximation algorithms, andwe chose the Nelder-Mead method. Weused the Likelihood Ratio Statistic (LRS)to test whether the effects of the covari-ables were significant [14]. The LRS isthe difference between deviances D =−2 log (L∗) of two nested modelsMi andM j. If M j contained ν more parametersthanMi:

LRS = D(Mi)−D(M j) = −2(log (L∗(Mi))

− log(L∗(M j)

))∼ χ2(ν df). (4)

Starting with K = 0, we increased Kprogressively until the addition of a newcluster was not significant anymore.

To visualize the results obtained by themodel, we mapped the estimated clusters.

The proportional hazards assumptionwas checked using the Schoenfeld resid-uals with function “cox.zph” of pack-age “survival”, as recommended in Hillet al. [14]. To confirm the findings of thesurvival model concerning the risk fac-tors, we performed a more classical model,the multiple linear regression (R-function“lm”), and paid attention to the agreementof the results with the findings of the spa-tial hazard model.

2.2.3. Analysis of the model properties

Properties of this spatial hazard modelwere tested on simulated datasets. We an-alyzed the number of clusters detected andtheir position, and compared to the numberand position of clusters simulated. We useda spatial domain of 1 by 1 with a homoge-neous point process, and distributed marksaccording to the presence of 0 up to 3 at-tractive clusters in 200 simulated datasets.

2.2.4. Spatial scan statistic with normalmodel

The new version1 of the softwareSaTScanTM allows performing cluster de-tection with a normal model, designed forcontinuous data. SaTScanTM can integratesome covariables but it is recommendedto use external regression software to ad-just for quantitative variables. We firstperformed a linear regression (R-function“lm”) in order to take into account the riskfactors of the disease (mean parity, per-centage of calvings during the winter andspring period, and herd size). We then per-formed the scan statistic on the residualsof the linear model to focus on unexplainedclustering and be able to compare to the re-sults of the spatial hazard model. We thenmapped the detected clusters.

3. RESULTS

3.1. Descriptive analysis

Mean (sd) ASCS was 3.12 (0.55), whilemean (sd) herd size was 39 cows (16),mean (sd) parity was 2.4 (0.3), and mean(sd) percentage of winter and spring calv-ings was 38% (16%).

The farm geographic distribution is il-lustrated in Figure 1, which represents thelocation of the farms studied in a graybackground. The farm density was geo-graphically non-homogeneous, and two ar-eas had higher densities: (1) the northwest,which is the main dairy production area inFrance, with 61% of the total number offarms, and (2) the north tip, with 6% ofthe total number of farms. By contrast, thesoutheast (i.e. the Mediterranean area) hada very low farm density. The map of ASCSspatial intensity (Fig. 2) showed that thenorth-central area and the southwest hadrelatively high ASCS values of around 3.5.

Page 7: A spatial hazard model for cluster detection on continuous

590 E. Gay et al.

Figure 1. Farm location (gray background)and clusters of high annual somatic cell scoredetected by spatial hazard modeling in thestudy sample of dairy herds in France (n =34 142, year 2000). �: cluster centre; α: clus-ter strength; • main cities close to the clusters;©: cluster range (ρ).

The correlogram of ASCS (Fig. 3)showed a positive and non negligible spa-tial correlation under a distance of 150 kmwith an approximated exponential form.Over this distance, it could be consideredas constant around 0. The behavior of thecorrelogram near distance 0 pointed out astrong nugget effect (autocorrelation of 1at a null distance if absence of a nuggeteffect), i.e. the presence of a relativelyhigh white noise (non spatial correlation)of about 70% of the total variability.

3.2. Spatial modeling of mastitis risk

The model with K = 3 (Mβ,3) wasselected since the presence of 3 clusterswas highly significant, while the 4th clus-ter was not (Tab. I). The detailed resultsof estimations for this model are presentedin Table II. Hazard of occurrence of theASCS value of a farm was significantly de-creased by increased mean parity (1 parity

Figure 2. Intensity of the annual somatic cellscore in the study sample of dairy herds inFrance (n = 34 142, year 2000).

Figure 3. Correlogram of the annual somaticcell score in the study sample.

decreased the risk by e0.5749 = 1.78), andthus this factor was positively associatedwith the occurrence of high ASCS. Thepercentage of winter and spring calvingswas significant and, even if the associationwas low, it was positively associated withASCS. The last risk factor, the herd size,

Page 8: A spatial hazard model for cluster detection on continuous

A spatial hazard model for cluster detection 591

Table I. Tests of the different hazard models of the annual milk somatic cell score.

Model Variables Deviance LRS P

number Test Value (df) value

M0,0: no covariable 0 644496.3

Mβ,0: risk factors alone 3 642493.2 M0,0 vs.Mβ,0 2003.1 (3) P < 0.001

Mβ,1: risk factors+1 cluster 7 641136.1 Mβ,0 vs.Mβ,1 1357.1 (4) P < 0.001

Mβ,2: risk factors+2 clusters 11 640719.8 Mβ,0 vs.Mβ,2 1773.4 (8) P < 0.001

Mβ,1 vs.Mβ,2 416.30 (4) P < 0.001

Mβ,3: risk factors+3 clusters 15 640681.4 Mβ,0 vs.Mβ,3 1811.8 (12) P < 0.001

Mβ,1 vs.Mβ,3 454.7 (8) P < 0.001

Mβ,2 vs.Mβ,3 38.4 (4) P < 0.001

Mβ,4: risk factors+4 clusters 19 640675.3 Mβ,0 vs.Mβ,4 1817.9 (16) P < 0.001

Mβ,1 vs.Mβ,4 460.8 (12) P < 0.001

Mβ,2 vs.Mβ,4 44.5 (8) P < 0.001

Mβ,3 vs.Mβ,4 6.1 (4) NS

LRS: Likelihood Ratio Statistic; df: degrees of freedom; Mβ,k: model with risk factors (β part)and k clusters.

was not significant. The multiple linearregression demonstrated a similar relation-ship for the 3 covariables (Tab. III).

The three spatial clusters were attrac-tive. The first one was detected in thenortheast (Fig. 1), close to the city ofTroyes. The second one spread in thecenter-west, around the city of Limoges.The third cluster was located in the south-west, close to the city of Tarbes.

3.3. Model properties undersimulations

The detailed results of the simulationstudy are presented in Table IV. In 97%of the cases the right number of clusterswas detected. Among those 97%, 5% de-tected an extra repulsive focus: when highvalues are concentrated on some areas, itcan happen mechanically that low valuesare concentrated as well elsewhere, form-ing a repulsive focus. The mean distancebetween centers of detected and simulatedclusters, i.e. the precision of localization,

was 0.047 in the spatial domain of 1 by1 unit.

3.4. Spatial scan statistic with a normalmodel

Specifying the upper limit for clustersize as a circular geographic region of ra-dius 250 km, 6 significant clusters weredetected with this method (Fig. 4).

4. DISCUSSION

4.1. Biological results

The results of the spatial hazard modelconcerning the introduced risk factors forASCS were consistent with previouslypublished results, indicating a significantassociation of parity and calving seasonwith ASCS used as an indicator of sub-clinical mastitis. Increased mean parity in-creases the risk of high ASCS levels [18];that can be due to the rise of persistence

Page 9: A spatial hazard model for cluster detection on continuous

592 E. Gay et al.

Table II. Spatial hazard model of the annual milk somatic cell score (K = 3 clusters).

Coefficient Standard exp(coef) LRS (df) P value

estimation deviation

Mean parity –0.5749 0.0176 0.56 723.30 (1) P < 0.001

Winter-spring calving –0.0091 0.0004 0.99 651.70 (1) P < 0.001

Number of cows 0.0321 0.0140 1.03 2.10 (1) NS

Cluster 1 α 0.0783 0.0149

ρ 0.1105 0.0127

xc 0.2462 0.0081

yc 0.0942 0.0107

Cluster 2 α 0.2775 0.0373

ρ 0.2328 0.0126

xc 0.0444 0.0200

yc –0.1766 0.0248

Cluster 3 α 0.1213 0.0217

ρ 0.1336 0.0104

xc –0.0597 0.0243

yc –0.5721 0.0257

LRS: likelihood ratio statistic; df: degrees of freedom; α: cluster strength; ρ: cluster range; xc: cluster xcoordinate; yc: cluster y coordinate.

Table III. Multiple linear regression model of the annual milk somatic cell score.

Coefficient estimation Standard deviation P value

Intercept 2.136 0.023 P < 0.001

Mean parity 0.3408 0.0094 P < 0.001

Winter-summer calving 0.0042 0.0002 P < 0.001

Number of cows 0.0119 0.0074 NS

and intensity of mammary infections withparity. A high percentage of winter andspring calvings is a risk factor for highASCS values, since weaker cow body con-dition and housing hygiene during thisperiod increase the risk of subclinical mas-titis [5]. Several studies highlighted thatherd size was negatively associated withSCS [2, 23]. In this study, no significanteffect was detected, but French herds are

of little size (less than 0.7% of farms havemore than 100 cows), and this can be areason why the effect of heard size didnot appear. Some other known risk fac-tors for ASCS were not available in thepresent work. Particularly, information onhygienic and milking conditions was lack-ing [3, 8]. Nevertheless, the method allowscomplementary variation factors to be eas-ily integrated in the model if available later.

Page 10: A spatial hazard model for cluster detection on continuous

A spatial hazard model for cluster detection 593

Table IV. Results of the simulation process (spatial domain of 1 by 1 with a homogeneous pointprocess, marks distributed according to the presence of 0 up to 3 “attractive” clusters).

Number of Number of Number of Number of

clusters simulated simulations clusters detected simulations (%)

0 50 0 50 (100)

1 50 1 attractive 43 (86)

1 attractive + 1 repulsive 4 (8)

2 attractive 3 (6)

2 50 2 attractive 47 (94)

3 attractive 3 (6)

3 50 2 attractive 1 (2)

3 attractive 44 (88)

3 attractive + 1 repulsive 6 (12)

Figure 4. Farm location (gray background) andclusters of high annual somatic cell score de-tected by the spatial scan statistic in the studysample of dairy herds in France (n = 34 142,year 2000).

Concerning the second and original partof the model, the presence of 3 clusterswas highly significant. The detected clus-ters, located in areas with low farm density(close to Troyes, Limoges and Tarbes),corresponded to regions mainly focused

on bovine and ovine meat and on cerealproduction. It is consistent with the spe-cialization in dairy production being linkedin France with lower ASCS [5]. Introduc-ing the farm density as a covariable in themodel could be a way to approximate thisspecialization factor. Nevertheless, a localanalysis would be necessary to preciselyexplain the factors associated with the clus-ters identified, since only local staff couldhave accurate and relevant information onlocal events or singularities having influ-enced ASCS.

4.2. Comparison with the spatial scanstatistic with normal model

The spatial scan statistic method identi-fied six clusters. The two larger ones, in thesouthwest and the northeast, included ap-proximately the same regions as the threeclusters detected with our spatial hazardmodel, but the four last ones were different.

The number of clusters detected by thespatial scan is not determined in an ob-jective way, and depends on the chosenupper limit for cluster size. In our datasetthe clusters were located in regions with

Page 11: A spatial hazard model for cluster detection on continuous

594 E. Gay et al.

low farm density, consequently the clas-sical use of 50% of the population as theupper limit was not adapted. We had to tryseveral upper limits and chose the one thatseemed to give the better results. Indeed,by its very construction of the alternativehypothesis, the scan tests can not theoret-ically handle the issue of multiple clusterdetection. Moreover, with the spatial scanstatistic, the integration of covariables hadto be done in a first step before clusterdetection.

4.3. Spatial hazard model for clusterdetection

The properties of this new model testedvia simulations demonstrated good detec-tion ability and precision of localization.The spatial hazard model we developedhas several advantages. First, the methodapplies to continuous variables; such a pos-sibility is of recent growing concern inthe issue of cluster detection. Secondly,the model is adjusted for risk factors asin Klassen et al. [16], and takes into ac-count the potential heterogeneity of thebackground population. It focuses on unex-plained spatial singularities, which can bedetected even in low density areas. More-over, the model being parametric, allowsfor the tests of comparative hypotheseson the two components, risk factors andcluster presence. As the classical hazardmodel, this one can easily accommodatecensored data. Moreover, if the baselinehazard function is available (e.g. exponen-tial, log Gaussian, Gamma type) a trueparametric likelihood can be derived.

On the contrary, the present drawback ofthis method is the need to fix the numberof clusters a priori and then to test sub-models to retain only significant clustercomponents. As for true likelihood meth-ods, one could bypass this issue by intro-ducing an a priori parameterized probabil-ity distribution for the number K of cluster

components. For example, if K is Poissondistributed with parameter λ, the corre-sponding log-likelihood in λ, β and γ iswritten: L =

∑∞k=0 I{k=k}(log(L(λ, β, γ(k)) −

λ + k log(λ) − log(k!)).Clearly, the maximum of L was

achieved for one value k of K, and thisamounted simply to penalize the pseudolikelihood by the term −λ + k log(λ)−log(k!) as the Bayesian InformationCriterion (BIC) or Akaike InformationCriterion (AIC) usually did.

Another consequence of using a para-metric model is the need for a definitionof the form of the cluster. In this study wechose a standard Gaussian form, charac-terized by intensity and range parametersand a circular form, a choice which wasconvenient for the further interpretation ofthe parameters. However, other parametricforms of the cluster function could better fitthe problem [29] if they were supported byparticular epidemiological arguments. Butwithout prior assumptions about the possi-ble spread of the disease, a circular formseems to be a good default choice.

This method of cluster detection basedon a spatialized hazard model allows com-bining two important fields of epidemi-ological studies: the classical analysis ofrisk factor effects, and the spatial analy-sis of the disease. Moreover, this methodapplies to continuous as well as discretevariables, and gives quantitative results.

The approach of the spatialized riskwith a cluster component is generic; it isalso intended to apply to other diseasesand to classical survival models consider-ing death or occurrence times of infectionas the variable of interest.

ACKNOWLEDGEMENTS

The authors would like to thank JoëlChadœuf for his helpful advice and his supporton statistics.

Page 12: A spatial hazard model for cluster detection on continuous

A spatial hazard model for cluster detection 595

REFERENCES

[1] Abrial D., Calavas D., Lauvergne N.,Morignat E., Ducrot C., Descriptive spatialanalysis of BSE in western France, Vet. Res.(2003) 34:749–760.

[2] Allore H.G., Oltenacu P.A., Erb H.N., Effectsof season, herd size, and geographic re-gion on the composition and quality ofmilk in the northeast, J. Dairy Sci. (1997)80:3040–3049.

[3] Barkema H.W., Schukken Y.H., LamT.J.G.M., Beiboer M.L., Benedictus G.,Brand A., Management practices associatedwith low, medium, and high somatic cellcounts in bulk milk, J. Dairy Sci. (1998)81:1917–1927.

[4] Barnouin J., Chacornac J.P., Aissaoui C., ElIdilbi N., Mazur A., Comment dépister lesdéséquilibres biologiques et les troubles desanté chez la vache laitière dans le cadred’études écopathologiques? Vet. Res. (1994)25:104–109.

[5] Barnouin J., Chassagne M., Bazin S.,Boichard D., Management practices fromquestionnaire surveys in herds with very lowsomatic cell score through a national masti-tis program in France, J. Dairy Sci. (2004)87:3989–3999.

[6] Busato A., Trachsel P., Schallibaum M.,Blum J.W., Udder health and risk factorsfor subclinical mastitis in organic dairyfarms in Switzerland, Prev. Vet. Med. (2000)44:205–220.

[7] Carpenter T.E., Methods to investigate spa-tial and temporal clustering in veteri-nary epidemiology, Prev. Vet. Med. (2001)48:303–320.

[8] Chassagne M., Barnouin J., Le Guenic M.,Expert assessment study of milking and hy-giene practices characterizing very low so-matic cell score herds in France, J. Dairy Sci.(2005) 88:1909–1916.

[9] Cressie N.A.C., Geostatistics, in: Barnett V.,Bradley R., Fisher N., Hunter J., Kadane J.,Kendall D. et al. (Eds.), Statistics for spatialdata, John Wiley and Sons, New York, 1991,pp. 58–67.

[10] Doran R.J., Laffan S.W., Simulating thespatial dynamics of foot and mouth dis-ease outbreaks in feral pigs and livestock inQueensland, Australia, using a susceptible-infected-recovered cellular automata model,Prev. Vet. Med. (2005) 70:133–152.

[11] Ely L.O., Smith J.W., Oleggini G.H.,Regional production differences, J. DairySci. (2003) 86:E28-E34.

[12] Gerbier G., Bacro J.N., Pouillot R., DurandB., Moutou F., Chadoeuf J., A point patternmodel of the spread of foot-and-mouth dis-ease, Prev. Vet. Med. (2002) 56:33–49.

[13] Harmon R.J., Physiology of mastitis and fac-tors affecting somatic cell counts, J. DairySci. (1994) 77:2103–2112.

[14] Hill C., Com-Nougué C., Kramar A.,Moreau T., O’Quigley J., Senoussi R.,Chastang C., Analyse statistique des don-nées de survie, INSERM Médecine-SciencesFlammarion, Paris, 1990.

[15] Huang L., Kulldorff M., Gregorio D., Aspatial scan statistic for survival data,Biometrics 63 (2007) 63:109–118.

[16] Klassen A.C., Kulldorff M., Curriero F.,Geographical clustering of prostate cancergrade and stage at diagnosis, before and af-ter adjustment for risk factors, Int. J. HealthGeogr. (2005) 4:1.

[17] Kulldorff M., A spatial scan statistic,Commun. Stat.-Theory Methods (1997)26:1481–1496.

[18] Laevens H., Deluyker H., Schukken Y.H.,De Meulemeester L., Vandermeersch R., DeMuelenaere E., De Kruif A., Influence ofparity and stage of lactation on the somaticcell count in bacteriologically negative dairycows, J. Dairy Sci. (1997) 80:3219–3226.

[19] Lawson A.B., Cluster modelling of dis-ease incidence via RJMCMC methods: acomparative evaluation, Stat. Med. (2000)19:2361–2375.

[20] Marshall R.J., A review of methods for thestatistical analysis of spatial patterns of dis-ease, J. R. Stat. Soc. A (1991) 154:421–441.

[21] Norstrom M., Pfeiffer D.U., Jarp J., A space-time cluster investigation of an outbreak ofacute respiratory disease in Norwegian cattleherds, Prev. Vet. Med. (1999) 47:107–119.

[22] Odoi A., Martin S.W., Michel P., MiddletonD., Holt J., Wilson J., Investigation of clus-ters of giardiasis using GIS and a spatial scanstatistic, Int. J. Health Geogr. (2004) 3:11.

[23] Oleggini G.H., Ely L.O., Smith J.W., Effectof region and herd size on dairy herd per-formance parameters, J. Dairy Sci. (2001)84:1044–1050.

Page 13: A spatial hazard model for cluster detection on continuous

596 E. Gay et al.

[24] Patil G.P., Taillie C., Geographic and net-work surveillance via Scan Statistics for crit-ical area detection, Stat. Sci. (2003) 18:457–465.

[25] Ratziu V., Massard J., Charlotte F., MessousD., Imbert-Bismut F., Bonyhay L., TahiriM., Munteanu M., Thabut D., Cadranel J.F.,Le Bail B., de Ledinghen V., Poynard T.,Diagnostic value of biochemical markers(FibroTest-FibroSURE) for the prediction ofliver fibrosis in patients with non-alcoholicfatty liver disease, BMC Gastroenterol.(2006) 6:6.

[26] Romain H.T., Adesiyun A.A., Webb L.A.,Lauckner F.B., Study on risk factors andtheir association with subclinical mastitisin lactating dairy cows in Trinidad, J. Vet.Med. B Infect. Dis. Vet. Public Health (2000)47:257–271.

[27] Silverman B.W., The kernel method forunivariate data, in: Cox D., Hinkley D.,Rubin D., Silverman B. (Eds.), Density es-

timation for statistics and data analysis,Chapman and Hall, London, 1986, pp.34−94.

[28] Stevenson M.A., Benard H., Bolger P.,Morris R.S., Spatial epidemiology of theAsian honey bee mite (Varroadestructor) inthe North Island of New Zealand, Prev. Vet.Med. (2005):241–252.

[29] Tango T., Takahashi K., A flexibly shapedspatial scan statistic for detecting clusters,Int. J. Health Geogr. (2005) 4:11.

[30] Ward M.P., Carpenter T.E., Techniques foranalysis of disease clustering in space and intime in veterinary epidemiology, Prev. Vet.Med. (2000) 45:257–284.

[31] Yang G.J., Vounatsou P., Zhou X.N., TannerM., Utzinger J., A Bayesian-based approachfor spatio-temporal modeling of county levelprevalence of Schistosoma japonicum in-fection in Jiangsu province, China, Int. J.Parasitol. (2005) 35:155–162.