Click here to load reader
Upload
acg2903
View
213
Download
0
Embed Size (px)
Citation preview
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 110
Estimating distributions out of qualitative and (semi)quantitative microbiologicalcontamination data for use in risk assessment
P Busschaert ab AH Geeraerd ac M Uyttendaele d JF Van Impe ab
a CPMF2 Flemish Cluster Predictive Microbiology in Foodsb Division of Chemical and Biochemical Process Technology and Control Department of Chemical Engineering Katholieke Universiteit Leuven W de Croylaan 46 B-3001 Leuven Belgiumc Division of Mechatronics Biostatistics and Sensors (MeBioS) Department of Biosystems (BIOSYST) Katholieke Universiteit Leuven W de Croylaan 42 B-3001 Leuven Belgiumd Laboratory of Food Microbiology and Food Preservation Department of Food Safety and Food Quality Ghent University Coupure Links 653 B-9000 Ghent Belgium
a b s t r a c ta r t i c l e i n f o
Article history
Received 10 July 2009
Received in revised form 7 January 2010
Accepted 17 January 2010
Keywords
Quantitative microbiological risk assessment
Second order Monte Carlo simulation
Survival analysis
Maximum likelihood estimation (MLE)
Bootstrapping method
A framework using maximum likelihood estimation (MLE) is used to 1047297t a probability distribution to a set of
qualitative (eg absence in 25 g) semi-quantitative (eg presence in 25 g and absence in 1 g) andor
quantitative test results (eg 10 CFUg) Uncertainty about the parameters of the variability distribution is
characterized through a non-parametric bootstrapping method The resulting distribution function can be
used as an input for a second order Monte Carlo simulation in quantitative risk assessment As an illustration
the method is applied to two sets of in silico generated data It is demonstrated that correct interpretation of
data results in an accurate representation of the contamination level distribution Subsequently two case
studies are analyzed namely (i) quantitative analyses of Campylobacter spp in food samples with
nondetects and (ii) combined quantitative qualitative semiquantitative analyses and nondetects of Listeria
monocytogenes in smoked 1047297sh samples The 1047297rst of these case studies is also used to illustrate what the
in1047298uence is of the limit of quanti1047297cation measurement error and the number of samples included in the
data set Application of these techniques offers a way for meta-analysis of the many relevant yet diverse data
sets that are available in literature and (inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by consequence the correctness of the outcome of
the risk assessment copy 2010 Elsevier BV All rights reserved
1 Introduction
As was stated by Oscar (2004) predictions of quantitative risk
assessment are only as good as the data used to develop and de1047297ne
them Microbiological contamination levels often are strongly
associated with the predicted risk (eg Peacuterez-Rodriacuteguez et al
2007) However since the collection of data generally is expensive
and labor intensive only a limited amount of data is available in most
cases Moreover the character of this kind of data is frequently
affected by other dif 1047297culties such as the low number of bacteria
present in a food sample compared to the limit of detection (LOD) or
limit of quanti1047297cation (LOQ) of the method of analysis Due to the
additional laboratory effort necessary to enumerate samples with
generally low numbers of bacteria present the typically low
prevalence of positive samples and regulatory requirements that do
not demand enumerative data laboratories often limit their efforts to
qualitative or semiquantitative analyses Because microbiological data
can consist of either qualitative (eg absence in 25 g) quantitative
(eg 10 CFUg) semiquantitative results (eg presence in 25 g and
absence in 1 g) or any combinations of these they can be hard to
summarize especially in the case where multiple combinationsof test
portion weights are used among different laboratory samples The
latter is often the case in risk assessment which uses as an input a
compilation of surveillance and research data from various sources
spread over multiple years (eg FAOWHO 2004 FDAUSDACDC
2003) In FDAUSDACDC (2003) for example positive presence
absence test were substituted by the limit of detection and the
frequency of positive presenceabsence tests together with this substi-
tution value was usedas a quantileof thecontaminationdistribution The
variance was determined on the other hand by enumeration studies so
that ndash based on a quantilefrom presenceabsence tests substituted by the
LOD and a variance based on enumeration studies ndash a concentration
distribution could be estimated
A maximum likelihood approach can be applied to deal with these
kinds of complex data sets Helsel (2005 2006) previously imple-
mented the maximum likelihood method to deal with measurement
values below a certain LOQ in environmental chemistry analyses
International Journal of Food Microbiology 138 (2010) 260ndash269
Corresponding author Division of Chemical and Biochemical Process Technology
and Control Department of Chemical Engineering Katholieke Universiteit W de
Croylaan 46 B-3001 Leuven Belgium
E-mail address janvanimpecitkuleuvenbe (JF Van Impe)
URL httpwwwcpmf2be
0168-1605$ ndash see front matter copy 2010 Elsevier BV All rights reserved
doi101016jijfoodmicro201001025
Contents lists available at ScienceDirect
International Journal of Food Microbiology
j o u r n a l h o m e p a g e w w w e l s ev i e r c o m l o c a t e i j f o o d m i c r o
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 210
Also Shorten et al (2006) and Lorimer and Kiermeier (2007)
suggested this method to deal with nondetects in microbiological
test results as opposed to biased approaches such as substitution of
nondetects by arbitrary values eg half of the LOQ Application of this
method can be found in eg Jordan (2005) and Pouillot et al (2007)
where laboratory samples were counted that were previously shown
to be positive using qualitative measurement method with higher
sensitivity
While these authors focussed primarily on applying maximumlikelihood estimation (MLE) to deal with quantitative data that are
censored at one side due to an LOQ the same techniques could be
generalized to combine quantitative semi-quantitative and qualita-
tive test results The 1047297rst objective of this research is therefore to
illustrate how MLE can be applied to represent complex data sets of
censored microbiological data with parametric distributions
Moreover although MLE allows to represent the variation of con-
tamination data by means of a parametric distribution additional
information with regards to variability and uncertainty could be
extracted from the available data as well This is the second objective
of this research Because uncertainty can be reduced by collection of
additional information while variability is inherent to any biological
system it is usefulto know what proportion of the variationin general is
caused by variability and what proportion is caused by uncertainty In
this case uncertainty is represented by additional statistical distribu-
tions ndash de1047297ned by hyperparameters ndash that describe the parameters of
the variability distribution of contamination Once these distributions
have been constructed the separation between variability and
uncertainty can be propagated in the course of the risk assessment by
the use of a second order Monte Carlo simulation
In this research the application of these techniques to microbiolog-
ical contamination data is explored Also the bias that originates when
alternative procedures are applied is examined The methods are
illustrated with in silico ndash ie computer simulated ndash data in order to
investigate the performance of these techniques Subsequently two
case studies based on laboratory measurements are explored namely a
data set of 656 quantitative measurements of Campylobacter in chicken
meat preparations (of which 59 are below the limit of quanti1047297cation)
and a data set of103 measurementsof Listeria monocytogenesin smoked1047297sh products consisting of quantitative semiquantitative and qualita-
tive measurements as well as nondetects
2 Material and methods
21 Maximum likelihood estimation and bootstrap
In case of a negative presenceabsence test the concentration of
the pathogentested for inthe food sampleis known to beless thanthe
limit of detection (LOD) of the analysis although no exact value is
known Also when an enumeration method is applied on a food
sampleand no colonies aredetected theconcentration is known to be
less than the limit of quanti1047297cation (LOQ) These values are said to be
left-censored Analogously a positive presenceabsence test results ina right-censored outcome
Maximum likelihood estimation is used to 1047297t a distribution to a set
of censored data (Cox and Oakes 1984 Helsel 2005) A parametric
candidate distribution is assumed to represent the observed data and
the MLE method estimates values for the parameters that are most
likely to have generated the observed measurements Parametric
distributions are chosen because they are assumed to correctly
represent the data sets considered here however in the case of more
complex data sets the risk assessor might also consider alternative
more complex models Zero-in1047298ated Poisson models or other variants
might be relevant in the case of count data with many nondetects
eg (Ridout et al 1998 Gonzales-Barron et al 2010) In case of more
dispersed data mixture models could be 1047297tted as well see for
example Creacutepet et al (2007) However it should be taken into
consideration that a mixture model should not be used to model the
heterogeneity of the speci1047297c data set (for example when food samples
were taken from 2 food business operators with each a different
contamination pro1047297le) when it is intended to be representative for a
more general situation (for example representing food samples of all
food business operators) Other studies have implemented hurdle
models with separate distributions for prevalence (presenceabsence)
and concentration (CFUg) of pathogens in food samples (see eg
Pouillot et al 2007 Peacuterez-Rodriacuteguez et al 2007) In this researchconcentration (CFUg) is modeled by a parametric distribution as
a hypothetical property of food samples for example a Poisson dis-
tribution with the concentration times serving size as a rate para-
meter could be implemented afterwards to model the contamination
in a speci1047297c serving as a non-negative integer
In the case of quantitative results the likelihood of obtaining these
results given a set of parameters θ is obtained by multiplication of all
values obtained by the probability density function p (middot) corresponding
to each data point xi given those parameters
leth X = f x1⋯ xngjθTHORN = prodn
i =1
peth xi jθTHORN eth1THORN
In the case of censored data the likelihood is given by multiplica-tion of areas under the probability density function instead of single
values hence the cumulative distribution function is used (Cox and
Oakes 1984) as is depicted in Fig 1 For computational convenience
logarithmic values of the likelihood are considered
The normal distribution has been chosen for the case studies
because it is generally assumed in food microbiology that contami-
nation is approximately log-normally distributed (Creacutepet et al 2007
Kilsby and Pugh 1981 Legan et al 2001) Ridout et al (1998) also
proposed the Poisson distribution and a Poisson distribution with a
Gamma distributed rate parameter λ however it was shown that the
lognormal distribution 1047297tted equally well to the data
Because these parameters are based on inference from measure-
ments of food samples which represent only a subset of the whole
population uncertainty about these parameters due to the limiteddata set is considered by using the bootstrapping technique (Efron
1982) In the parametric bootstrap method a distribution is 1047297t to a
data set Based on this distribution B new samples (of the same
sample size as the original data set) are sampled from this single
distribution For each of the B new samples the parameter of interest
is estimated and the distribution of all B estimated parameters
represents uncertainty about the estimate The empirical bootstrap
method is based on a similar procedure but instead of drawing B
samples from one1047297tted distribution theoriginaldata set is resampled
with replacement B times and a distribution is 1047297tted to each of the B
new samples
To include censored data points the empirical bootstrap approach is
chosen(as wasdone forexample by Zhaoand Frey 2004)Foranumberof
B=1000 iterations a bootstrap sample (ie sampling with replacement)
Fig 1 Illustration of the likelihood of quantitative data (left) versus interval-censored
data (right)
261P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 310
is taken from theoriginal censored data setof measurements Parameters
for that bootstrap sample are determined using MLE and quantiles of the
obtained distribution are stored in order to express a con1047297dence interval
later on
The joint probability distribution obtained by the bootstrap
samples is used to express uncertainty about the parameters due to
the limited number of samples Whenever the original data set is
available to the risk assessor it is possible for the risk assessor to
generate bootstrap samples so that obtained values can be useddirectly as input values of a two-dimensional Monte Carlo simulation
However this is not always the case for example in the case of large
data sets Therefore to enable communication of parameter uncer-
tainty a parametric distribution that 1047297ts best to the bootstrap
estimates is chosen for each parameter based on visual comparison
with the kernel density plots quantile-quantile plots and the χ 2 and
Anderson-Darling goodness-of-1047297t tests Distributions that could
generate implausible samples (such as negative values for a standard
deviation) should be avoided or truncated In order to examine the
correlation between parameters among bootstrap samples scatter
plots the Pearson correlation coef 1047297cient and Spearman rank order
correlation coef 1047297cient are evaluated
To be able to visually represent the combined variability and
uncertainty the 95 con1047297dence intervals of all individual quantiles of
the variability distribution are expressed Among different bootstrap
iterations thevalue of each quantile of the variability distribution will
vary For all quantiles of the variability distribution an uncertainty
distribution is obtained of which the 25 50 and 975 percentile (over
all bootstrap iterations) are calculated For all cumulative density
plots that will follow the reading key is as follows The median value
of all quantiles of the variability distribution is indicated with a black
line the interval between the 25 and 975 percentile is indicated with
a grey area A large grey area in horizontal direction indicates large
uncertainty about the value of the respective quantiles of the variability
distribution ie the values of the quantiles differ considerably among
different bootstrap iterations A small grey area indicates that the value
of each quantile does not alter much in between different bootstrap
iterations hence uncertainty is small The degree of variability is indi-
cated by the interval the median line spans from the lower quantiles tothe higher quantiles
Thecode forthesesimulationsis written in R (R Development Core
Team 2009) using functions from the survival package for MLE
procedures and can be obtained from the authors upon request
Although not used in the present study at the time of writing a
package1047297tdistrplus appeared which canbe used for thesame purpose
(Delignette-Muller et al 2008)
22 Data sets
For illustrative purposes a set of left-censored data is pseudo-
randomly generated in silico in order to simulate quantitative
measurements with a single LOQ below which no values can be
measured To begin with one random sample of 100 data points ispseudorandomly generated from a normal distribution with mean
μ =0 log10 CFUg and standard deviation σ =2 log10 CFUg An LOQ
is chosen at the 40th percentile of the normal distribution and the
data set is censored so that approximately 40 of the data set will fall
below that LOQ and hence will be regarded as nondetects For this
speci1047297c data set 40 out of 100 data points are regarded as below the
LOQ after censoring
Instead of determining the number of CFU of a pathogen per gram
of a food sample a portion of 25 g for example could be examined for
the mere presence of a micro-organism A negative test result
indicates a concentration of less than (a hypothetical) 004 cells per
gram of that food sample If the test result is positivea smaller portion
of that same laboratory sample stored at a temperature not allowing
for growth eg a portion of 1 g could be examined again for
presenceabsence If a homogeneous distribution of cells among test
portions is assumed semi-quantitative results are obtained this way
A positive 25 g portion and a negative 1 g portion would indicate for
example a bacterial concentration of between 004 and 1 CFUg This
outcome is said to be interval-censored
A similar situation is simulated in the second illustration The same
original ndash uncensoredndash data set of 100pseudorandom values generated
from a normal distribution with mean μ =0 log10 CFUg and standard
deviation σ =2 log10 CFUg is used for this second illustration Twodetection limits are chosenso that the1047297rst one (LOD1) issituated at the
60th percentile of the distribution and the second one (LOD2) at the
80th percentile Subsequently the data are transformed into purely
semiquantitative data ie each data point is reduced to either smaller
than the 1047297rst LOD between the 1047297rst and second LOD or greater than the
second LOD
As a 1047297rst case study based on real microbiological measurements
laboratory analyses of Campylobacter in chicken meat preparations at
the Belgian retail market are analyzed (Habib et al 2008a) The data
set consists of direct plating results using the ISO (2006) standard
method with a reduced limit of quanti1047297cation By plating one milliliter
of the primary 10-fold diluted suspension of the chicken meat sample
on three modi1047297ed cefoperazone charcoal deoxycholate agar (mCCDA
Oxoid Basingstoke England UK) spread plates of 90 mm diameter a
limit of quanti1047297cation of 10 CFUg is obtained instead of the usual LOQ
of 100 CFUg In 387 out of 656 measurements (59) the result is left-
censored due to the LOQ
This1047297rst casestudy isalsoused toexamine the in1047298uence of a number
of conditions In order to check the effect of using a reduced LOQ
all values below 100 CFUg (the standard LOQ of the Campylobacter
enumeration method) were assumed to be censored and hence
regarded as nondetects To determine if measurement error (assumed
to be plusmn05 log10 units) (Habib et al 2008b) would have a signi1047297cant
impact on the resulting distributions another simulation is run with all
quantitative data points xi replaced by the interval [ ximinus05 xi+05]
log10 CFUg in other words xi has not its quantitative value anymore but
is assumed to be interval censored By doing this it is stated that dueto
measurement error the real value of measurement xi is known only to
be within the interval [ ximinus05 xi+05] Finally to illustrate whattheimpact of the size of thedata set is the simulation is also conducted
with only half the number of data points For that purpose 328 data
points are pseudorandomly sampled with replacement from the
original data set
A set of 103 laboratory samples of smoked1047297sh on the Belgian retail
marketis used asa secondcasestudy(Table 1) Thelaboratory samples
were analyzed in the period 2005-2007 for a number of food business
operators (Uyttendaele et al 2009) In most cases a test portion of
25 g is analyzed qualitatively for the presence of L monocytogenes
according to the AFNOR validated VIDAS LMO method (Bio-129-07
02) In case of a positive test result either a smaller test portion of the
samelaboratory sample is analyzed(eg presenceabsencetesting per
001 g) or a test portion of the same laboratory sample is enumerated
Table 1
Overview of contamination data of L monocytogenes in smoked 1047297sh on the Belgian
retail market in the period 2005-2007
Number of samples Concentration (CFUg)
54 lt004
2 lt100
26 004minus10
1 15
8 004minus100
2 gt100
1 lt1
1 gt1
7 004minus1
1 1-100
262 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 410
by plating on ALOA (Fiers Belgium) according to ISO (1998Amd 1
2004) standard method using a reduced limit of quanti1047297cation (thus
LOQ 10 CFUg) In case of execution of quantitative tests most of the
outcomes are below this LOQ which again results in semiquantitative
data For a number of samples other sample weights were tested
according to the demands of the food business operators hence
resulting in a rather complex data set
3 Results and discussion of the in silico generated data
31 Illustration 1 left-censored quantitative data
A data set is in silico generated and left-censored representing
quantitative measurements in log10 CFUg A total of 100 data points
are pseudorandomly drawn from a normal distribution with mean μ
equal to 0 and standard deviationσ equal to 2 (see Fig 2) The(lower)
LOQ is set to -0507 log10 CFUg which is the 40th percentile of a
cumulative normal distribution function with μ =0 and σ =2 After
censoring 40 out of 100 data points are regarded as below the LOQ
Using MLE a normal distribution is 1047297tted to these censored data
with 1047297tted mean μ = -005 log10 CFUg and 1047297tted standard deviation
σ = 188 log10 CFUg The sample mean of the data set when all
nondetects would have been substituted with the LOQ is 049 log10
CFUg or 029 log10 CFUg when nondetects would have been
substituted by half of the (logarithmic) LOQ (see Table 2) For
comparison the mean of the original sampled data ndash before any
censoring algorithm is applied and hence the true sample mean ndash is
005 log10 CFUg This illustrates the major bias that originates when
alternative practices such as substition of nondetects by the LOQ are
applied to data sets As is illustrated by Lorimer and Kiermeier (2007)
this has been the case frequently in past research If nondetects would
have been ignored the estimated mean is 115 log10 CFUg This
approach is often intended to model only the positive subpopulation
along with an additional variable for prevalence however this
implies that the sensitivity of the detection method is ignored ndash eg
when a counting method has an LOQ of 10 CFUg ndash and that these
results cannot be transferredto other portion sizes than themeasured
one The true sample standard deviation is 177 log10 CFUg Thevaluesof the standard deviations are estimated to be lower in the case
of substitution methods because these methods push low data points
(nondetects) towards the center of the sample distribution
Subsequently non-parametric bootstrapping is applied to estimate
theuncertainty distributions for themean and standard deviation of the
estimated concentration The original censored data set is resampled
with replacement for B =1000 bootstrap iterations and each time a
normal distribution is 1047297tted to the bootstrap sample using MLE From
these 1047297tted distributions quantiles are calculated and stored and the
95 con1047297dence interval for each quantile is subsequently determined
from the variation within bootstraps The result is plotted in Fig 3a
Distributions are 1047297tted to the bootstrap statistics to estimate
hyperparameters The bootstrap means μ are described by a normal
distribution with mean μ μ =minus005 log10 CFUg and standard
deviation σ μ = 021 log10 CFUg The standard deviation of the datasample is represented by a gamma distribution with shape parameter
α σ =994 and scale parameter β σ =187middot10minus2 (see Fig 3c and d)
As opposed to the next illustration the bootstrap means and
standard deviations ndash belonging to a two-dimensional space ndash show no
obvious correlation (see Fig 3b) The Pearson correlation coef 1047297cient
between both equals -0319 and the Spearman rank order correlation
coef 1047297cient -0302
32 Illustration 2 semiquantitative data
The same original data set of Illustration 1 generated from a
normal distribution with mean μ equal to 0 and standard deviation σ
equal to 2 is transformed to represent semiquantitative measure-
ments in log10 CFUg The 1047297rst limit of detection LOD1 is set to 0507
log10 CFUg and the second limit of detection LOD2=168 log10 CFU
g In this data set 64 data points fall below the 1047297rst LOD and hence
would be noted as negative test results Eighteen data points fall
above the second LOD ie 18 laboratory samples would be positive
for both the 1047297rst and the second presenceabsence test All other data
points fall between the two limits of detection and hence represent
laboratory samples of which the1047297rst presenceabsence test is positive
and the second one negative This is visualized in Fig 4
Using MLE the censored data set was 1047297tted to a normal
distribution with mean μ =-025 log10 CFUg and standard deviation
σ =211 log10 CFUg For comparison the mean and standard
deviation of the original data set that is before the censoring
algorithm is applied to it are respectively 005 log10 CFUg and 177
log10 CFUg The 1047297tted distribution (after censoring) resembles the
original sample distribution (before censoring) remarkably wellespecially considering the fact that the information is reduced rather
drastically as opposed to purely quantitative measurements The fact
that random data were initially generated from a normal distribution
which corresponds to the distribution assumed for maximum
likelihood estimation naturally contributes to the good results
Bootstrap samples are subsequently generated to determine the
uncertainty of the parameters of the distribution The 95 con1047297dence
interval is plotted in Fig 5a As can be seen uncertainty increases for
the lower concentrations which can be explained by the fact that
approximately 60 of measurements is treated as undetected without
further information The meansof thebootstrap samples are1047297ttedtoa
normal distribution with mean μ μ = -030 log10 CFUg and standard
deviation σ μ = 042 log10 CFUg For comparison in the uncensored
case the standard error about the mean would be estimated to be177=
ffiffiffiffiffiffiffiffiffi
100p
= 0177 according the the central limit theorem This
illustrates the increase of uncertainty due to censoring The standard
deviation is 1047297tted to a gamma distribution with shape parameter
α σ =186 and scale parameter β σ =117middot10minus1
The mean and standard deviation belong to a two-dimensional
parameter space and are generally not to be considered as
independent If a scatterplot of the 1047297tted mean versus the 1047297tted
standard deviation of each bootstrap sample is examined it is clear
that a correlation has risen between both due to the large amount of
values censored below the 1047297rst LOD For this particular illustration
this situation can be explained as follows When many data points
below the lower limit of detection are selected in a bootstrap sample
the 1047297tted bootstrap mean is lower and the bootstrap standard
deviation in contrary is estimated higher which induces a (negative)
Fig 2 Histogram of the in silico generated sample data points of the 1047297rst illustration
with the vertical line indicating the lower LOD under beneath which all data points are
to be censored
263P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 510
correlation between both parameters This is depicted in Fig 5b The
Pearson correlation coef 1047297cient between them equals -0733 and
the Spearman rank order correlation coef 1047297cient -0677 In case the
original data set is available to the risk assessor the results of the
bootstrap method can be used directly as input values of a two-
dimensional MonteCarlo simulation However when only distributions
representing parameter uncertainty are communicated (for example in
the case of data sets too large to include in a report) it is better not to
draw random samples for the mean and standard deviation indepen-
dently one from another in a 2D Monte Carlo simulation for risk
Table 2
Overview for all illustrations and case studies of the results of the maximum likelihood estimation and of the results of mean and standard deviation calculated with substitution
methods (all units in log10 CFUg)
Data set True
distribution
Sample
parameters
MLE Substitution of nondetects
Fitted distribution Substitution by 1
2 LOD Sub st itu tion by LOD I gnor ing nondetect s
Illustration 1 μ =0 x = 005 micro =ndash005 micro = 029 micro = 049 micro =115
quantitative data σ =2 s =177 σ=188 σ=144 σ=127 σ=125
Illustration 2 μ =0 x = 005 micro =ndash025 - - -
semi-quantitative data σ =2 s =177 σ=211
Case study 1 unknown unknown micro = 073 micro = 110 micro = 128 micro =168
Campylobacter data σ=103 σ=064 σ=053 σ=065
Case study 1a unknown unknown micro = 046 micro = 116 micro = 206 micro =258
increased LOD σ=122 σ=051 σ=024 σ=050
Case study 1b unknown unknown micro = 072 - - -
measurement error σ=099
Case study 1c unknown unknown micro = 063 micro = 107 micro = 126 micro =169
reduced data set σ=107 σ=062 σ=051 σ=064
Case study 2 unknown unknown micro =ndash158 - - -
L monocytogenes data σ=154
Fig 3 Illustration 1 (a) plot of the 95 con1047297dence interval of the log-normal distribution 1047297tted to the set of left-censored quantitative measurements (b) scatterplot of the bootstrap
sample means versus standard deviations and kernel density plot of respectively the means (c) and standard deviations (d) of the bootstrapsamples (grey) with the 1047297tted normal and
gamma distributions plotted on top of it (black)
264 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 610
assessment because this would incorrectly in1047298uence the representation
of uncertainty intervalsfor thelowerpercentilesof the1047297nal distribution
A number of solutions exist to this issue One could for example use
copulas or apply the Iman-Conover method for correlated sampling
from two distributions (Haas et al 1999 Iman and Conover 1982)
Here as an alternative solution the standard deviation is modelled as a
linear function of the mean with addition of an error term similarly as
was done by Calistri and Giovannini (2008) Based on the scatterplot it
is found reasonable to assumethat themean and standard deviation are
related approximately linearly and the error term remains of the same
magnitude overthe range of the mean For other case studies nonlinear
regression could be applied as well
The standard deviation is formulated as a linear function of the
mean
σ = β0 + β1sdotμ +
eth2
THORNwith β 0 and β 1 being respectively the intercept and slope of the linear
relation and an error term In this case following equation is ob-
tained by performing linear regression followed by assessment of the
residual values
σ = 191minus0888sdotμ + Normethμ = 0σ = 0344THORN eth3THORN
For comparison the results obtained by bootstrapping and the
results obtained by this linear model are plotted on top of each other
see Fig 5c
4 Results for real food product data
41 Case study 1 Campylobacter in chicken meat preparations
In this 1047297rst case study the results of Campylobacter analyses in
chicken meat preparations are evaluated The data set comprises
quantitative analysis results with an LOQ of 10 CFUg In 387 of the
656 measurements (59) the result is left-censored due to the LOQ
Using MLE the logarithms of the censored data have been 1047297tted to a
normal distribution with mean μ =073 log10 CFUg and standard
deviation σ =103 log10 CFUg
For comparison if the nondetects would have been substituted by
half of the LOQ the 1047297tted distribution would have been a normal
distribution with mean μ =110 log10 CFUg and standard deviation
σ =064 log10 CFUg (see Table 2)
After bootstrapping the mean and standard deviation are repre-
sented by a normal distribution and a gamma distribution respectively
The means of the bootstrap samples are 1047297tted to a normal distribution
with hyperparameters mean μ μ =073 log10 CFUg and standard
deviation σ μ =006 log10 CFUg The standard deviation is 1047297tted by a
gamma distribution with shape parameter α σ =304 and scale param-
eter β σ =339middot10minus3
The resulting distribution with its 95 con1047297dence interval is
shown in Fig 6a
As can be seen uncertainty about the distribution parameters is
rather small compared to variability as could be expected due to thelarge data set and the many remaining quantitative non-censored
data points
The Campylobacter data set is also used to test the in1047298uence of a
number of factors Firstly to check the in1047298uence of the limit of
quanti1047297cation all data points of the data set are censored to an
increased LOQ of 100 CFUg (standard LOQ of the Campylobacter
enumeration method) instead of 10 CFUg (reduced limit of quanti-1047297cation obtained by plating one milliliter over three mCCDA plates)
In this new data set 589 out of 656 values ie 90 (as opposed to 59
in the original data set) are censored Using maximum likelihood the
new estimated mean and standard deviation are 046 and 122 log10
CFUg The resulting distribution after bootstrapping is shown in
Fig 6b As can be seen this increased LOQ has a high in1047298uence on
parameter estimates as well as on uncertainty Despite the speci1047297city
of this particular case study it illustrates (in an opposite way) the
important impact a reduction of the limit of quanti1047297cation of current
detection methods (and thus an increase of non-censored values)
(eg Gnanou Besse et al 2004) might have on the obtained results
when data sets include a signi1047297cant amount of nondetects
It is also tested what the effect would be if the measurement
error would be included at a realistic level corresponding to routine
laboratory measurements A measurement error of 05 log10 CFUg is
superimposed on all original quantitative measurements thus replacing
all quantitative data points xi with an interval [ ximinus05 xi+05] log10
CFUg The newly obtained estimations of mean and standard deviation
are respectively 072 and 099 log10 CFUg Implementing measurement
error appears to have very little impact on the obtained result for this
data set as can be seen in Fig 6c
To illustrate theimpact of the number of data points on the obtaineddistribution a distribution is 1047297tted to half the number of data points
328 data points are randomly sampled from the original data set of 656
data points and subjected to MLE and bootstrapping The estimated
mean andstandard deviation are respectively 063 and107 log10 CFUg
The resulting distribution is depicted in Fig 6d Although uncertainty
intervals do increase somewhat the deviation of the new distribution
compared to the originally obtained distribution (Fig 6d) remains
rather limited despite the fact that the number of data points has been
reduced drastically This indicates that the investment of labor and costs
in a large number of additional measurements might not always have
the expected impact on the resulting output distribution
The results of all of these test cases are summarized in Table 2
Similarly as in the previous illustrations the bias that arises when
substitution methods are used or if nondetects are ignored can be seen
42 Case study 2 Listeria monocytogenes in smoked 1047297sh samples
The second case study consists of 103 measurements of Listeria
monocytogenes in smoked1047297sh samples As opposed to the Campylobacter
case study this data set contains merely 1 quantitative measurement
(1 laboratory sample enumerated L monocytogenes gt 10 CFUg ie
15 CFUg) All other measurements are either interval- left- or right-
censored Moreover the data set contains several different LODs
depending on the demands of the food business operator the particular
food samples were supplied by
Using MLE the logarithmic values of the analysis results are1047297tted to
a normal distribution with mean μ =minus158 log10 CFUg and standard
deviation σ =154 log10 CFUg Based on the empirical distributions
Fig 4 Histogramof the pseudorandomdata points usedin Illustration2 withthe vertical
lines indicating the limits of detection of the 1047297rst and second presenceabsence test
265P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 710
of the bootstrap estimates of the distribution parameters the normal
distribution is chosen to 1047297t both the mean and the standard deviation
The mean is 1047297tted to a normal distribution with hyperparameters
mean μ μ =minus158 log10 CFUg and standard deviation σ u=020 log10CFUg The standard deviation is 1047297tted to a normal distribution with
hyperparameters mean μ σ =151 log10 CFUg and standard deviation
σ σ =028 log10 CFUg To avoid sampling of negative values for the
standard deviation this distribution is truncated at zero
As can be seen in the scatterplot of the bootstrap means versus
the bootstrap standard deviations (see Fig 7a) a small number of
bootstrap samples at the lower part of the 1047297gure deviate from the
majority of the samples This deviation is caused by the absence in therespective bootstrap samples of a number of rare intervals from the
original data set These intervals all have concentration values higher
than the general mean value and their inclusionleads to an increase of
the standard deviation This separation between a small cloud of
points with low standard deviation and a big cloud with the majority
of the points is hence purely a consequence of the complexity of
this particular data set but illustrates the limitations of the non-
parametric bootstrap method When a data set has relatively few
distinct values (in this particular case 10 distinct values are present)
differences can be great between bootstrap samples This should
always be checked for when applying non-parametric bootstrap This
problem does not occur when the parametric bootstrap method is
applied however applying the parametric bootstrap method to
censored data would incorrectly result in different uncertainty intervals
if compared to non-parametric bootstrappingwhichcould lead to a fail-
dangerous underestimation a number of test simulations have
con1047297rmed this (results not shown) The parametric bootstrap however
could be applied by generating bootstrap samples from a parametric
distribution and censoring them manually in the speci1047297c case that the
complete data set has to be compared to one LOQ only (Zhao and Frey
2004)
Theresulting distributionwith its95 con1047297dence interval is shown
in Fig 7b
5 Discussion for real food product data
The examples presented in this article illustrate how complex data
sets including nondetects semiquantitative and qualitative measure-
ments can be interpreted in an appropriate way for use in microbio-logical risk assessment Ignoring nondetects or substituting them with
the LODLOQ or half of it is a classical source of bias (cf Lorimer and
Kiermeier 2007) that canand should be avoidedusingthesemethods It
has been demonstrated in this paper that even complex data sets
including either very diverse analyses or large amounts of censored
values can lead to very satisfying outcomes Nevertheless attention
must be paid to the possibilities and limitations of these methods
Blindly1047297tting a dataset with limited information (for example a data set
consisting of purely presenceabsence tests as obtained if analyses are
performed for compliance testing to a set legal criterion) to a speci1047297c
distribution might result in unrealistic outcomes Moreover the limited
small sample properties of the non-parametric bootstrap method must
be taken into account as is illustrated in the second case study This
supports the recommendation to set up dedicated baseline surveys fordata gathering to be used for risk assessment
The illustration of the various case studies (either with hypothet-
ical data sets and with real-life microbiological data sets from
dedicated or ad hoc combined surveys) shows that it is important in
establishment of microbiological baseline surveys to apply some
semi-quantitative methodology and by preference a methodology
enabling an estimation of numbers present in the positive laboratory
samples For example Straver et al (2007) estimated contamination
of chicken breast 1047297let with Salmonella using a combination of prior
enrichment of pooled laboratory samples and subsequent enumeration
of Salmonella in positive laboratory samples using a Most Probable
Number (MPN) assay The use of MPN methods or enumeration
methods (as in the Camplyobacter case study) with reduced limit of
quanti1047297cation which overall providerather an estimate of thenumber of
Fig 5 Illustration 2 (a) plot of the 95 con1047297dence interval of the 1047297tted log-normal
distribution for the set of semiquantitative measurements (b) scatter plot of the means
versus the standard deviations of the bootstrap samples and (c) comparison of the
parametersobtained by bootstrapping () andrandomly generatedparameters using a
linear relationship ()
266 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 810
pathogens present instead of an exact value is in the frame of the
estimation of distributions of the contamination level not a crucial
factor In the present Campylobacter case study it was shown that
inclusion of the measurement error interval for quantitative analyses
hardly affects the estimated distributions
The proportion ofnondetects onthe otherhandmay have a signi1047297cant
impact on theresult as has been shown in the present Campylobacter case
study This illustrates the positive effect that lowering the limit of
quanti1047297cation of a certain analysis method might have on lowering theuncertainty of thedistributionwhen themicrobiologist is confrontedwith
a substantial amount of nondetects It was noticed for the Listeria case
study thatthe uncertaintyof the distribution is especiallyincreased at the
lower levels (lt001 CFUg) (Fig 7b) as in this concentration range only
left-censored data are available More information on the estimated level
of contamination would enable to decrease uncertainty However overall
the estimated mean level of contamination for Listeria (micro =ndash158 log10CFUg) is much lower than for Campylobacter (micro =073 log10 CFUg)
which means that having access to enumeration data at these very low
levels of contamination will take considerable laboratory effort require
adapted methodological procedures and thus related costs for obtaining
this type of data set
On the other hand it was shown that to obtain a good data set for
estimation of distributions of contamination levels it does not
necessarily demand a large study In the present Campylobacter case
study it was illustrated that increasing the number of analyses to a
large extent might lead to only a limited additional reduction of
uncertainty in the case of an already suf 1047297cient data set with rep-
resentative outcomes The distribution of the Campylobacter contam-
ination level shown in Fig 6d is based upon 122 enumeration results
(obtained from in total 328 laboratory samples analyzed) whereas in
Fig 6a 269 enumeration values were available (obtained from 656
laboratory samples analyzed) for the estimation of the distribution of contamination level
Setting up a baseline survey to acquire a data set to serve as the
basis for estimation of an input distribution for risk assessment thus
has to take into account appropriate methodology to provide a suf-1047297cient number of detects and estimates of numbers but also has
to provide results which are representative for the objective of the
risk assessment eg food product under consideration stage in the
production chain variability between producers seasonality etc in
order not to introduce bias in the distribution obtained As such
setting up a baseline survey is a complex exercise Nevertheless if
the data set is available appropriate techniques also need to be used
to translate the information from the data set into a distribution
In the present study an approach based upon maximum likelihood
estimation wasshownto provide good resultsto presentthe variationof
Fig 6 Case study 1 (a) plot of the 95 con1047297dence interval of the normal distribution 1047297tted to the Campylobacter contamination data (b) in1047298uence of an increased LOD on the
resultingdistribution Thedotted lines representthe original data set thefull lines representthe data seton which an increasedLOD hasbeen imposed (c)in1047298uence of inclusionof a
measurement error interval on the estimated distribution (full lines) compared to the original data set (dotted lines) (d) in 1047298uence of the number of data points comparison of the
results of a random subset with N 2 data points (full lines) with the original data set (dotted lines)
267P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 910
contamination data Additional information with regard to variability
and uncertainty could be extracted as well using the bootstrap method
The same methodology can equally be applied with more complex
models such as mixture models or Poisson-like models Alternative
methods such as Bayesian analysis can also be applied and lead to
similar outcomes (results not shown) Examples of a Bayesian analysis
can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al
(2007)
Application of these techniques offers a way for meta-analysis of the
many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by conse-
quence the correctness of the outcome of the risk assessment
Acknowledgements
This research is supported in part by the Research Council of the
Katholieke Universiteit Leuven (projects OT0925TBA and EF05006
Center-of-Excellence Optimization in Engineering) knowledge plat-
form KP09005 (SCORES4CHEM) of the Industrial Research Fund the
Belgian Program on Interuniversity Poles of Attraction initiated by the
Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c
Research-Flanders (FWO-Vlaanderen project G042409 N) J Van
Impe holds the chair Safety Engineering sponsored by the Belgian
chemistry and life sciences federation essencia Research is conducted
utilizing high performance computational resources provided by the
University of Leuven httpluditkuleuvenbe
We would like to thank the Ghent University cluster of the
Department of Veterinary Public Health and Food Safety Faculty
of Veterinary Medicine and Department of Food Safety and Food
Quality Faculty of Bio-Science Engineering who kindly provided the
Campylobacter dataderived from a Federal Public Health Service funded
project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food
Safety and Food Quality Faculty of Bio-Science Engineering Ghent
University is acknowledged for providing the data on the microbiolog-
ical analysis and challenge testing for L monocytogenes
References
Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287
Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616
Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall
Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258
Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg
Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38
FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml
FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml
Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127
Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a
relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277
Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74
Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA
Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439
Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334
ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method
ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method
Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73
Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354
Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147
Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318
Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009
Evaluation of the ldquotesting and schedulingrdquo
strategy for control of Campylobacter in
Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the
bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution
1047297tted to the logarithmic Listeria monocytogenes contamination data
268 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 210
Also Shorten et al (2006) and Lorimer and Kiermeier (2007)
suggested this method to deal with nondetects in microbiological
test results as opposed to biased approaches such as substitution of
nondetects by arbitrary values eg half of the LOQ Application of this
method can be found in eg Jordan (2005) and Pouillot et al (2007)
where laboratory samples were counted that were previously shown
to be positive using qualitative measurement method with higher
sensitivity
While these authors focussed primarily on applying maximumlikelihood estimation (MLE) to deal with quantitative data that are
censored at one side due to an LOQ the same techniques could be
generalized to combine quantitative semi-quantitative and qualita-
tive test results The 1047297rst objective of this research is therefore to
illustrate how MLE can be applied to represent complex data sets of
censored microbiological data with parametric distributions
Moreover although MLE allows to represent the variation of con-
tamination data by means of a parametric distribution additional
information with regards to variability and uncertainty could be
extracted from the available data as well This is the second objective
of this research Because uncertainty can be reduced by collection of
additional information while variability is inherent to any biological
system it is usefulto know what proportion of the variationin general is
caused by variability and what proportion is caused by uncertainty In
this case uncertainty is represented by additional statistical distribu-
tions ndash de1047297ned by hyperparameters ndash that describe the parameters of
the variability distribution of contamination Once these distributions
have been constructed the separation between variability and
uncertainty can be propagated in the course of the risk assessment by
the use of a second order Monte Carlo simulation
In this research the application of these techniques to microbiolog-
ical contamination data is explored Also the bias that originates when
alternative procedures are applied is examined The methods are
illustrated with in silico ndash ie computer simulated ndash data in order to
investigate the performance of these techniques Subsequently two
case studies based on laboratory measurements are explored namely a
data set of 656 quantitative measurements of Campylobacter in chicken
meat preparations (of which 59 are below the limit of quanti1047297cation)
and a data set of103 measurementsof Listeria monocytogenesin smoked1047297sh products consisting of quantitative semiquantitative and qualita-
tive measurements as well as nondetects
2 Material and methods
21 Maximum likelihood estimation and bootstrap
In case of a negative presenceabsence test the concentration of
the pathogentested for inthe food sampleis known to beless thanthe
limit of detection (LOD) of the analysis although no exact value is
known Also when an enumeration method is applied on a food
sampleand no colonies aredetected theconcentration is known to be
less than the limit of quanti1047297cation (LOQ) These values are said to be
left-censored Analogously a positive presenceabsence test results ina right-censored outcome
Maximum likelihood estimation is used to 1047297t a distribution to a set
of censored data (Cox and Oakes 1984 Helsel 2005) A parametric
candidate distribution is assumed to represent the observed data and
the MLE method estimates values for the parameters that are most
likely to have generated the observed measurements Parametric
distributions are chosen because they are assumed to correctly
represent the data sets considered here however in the case of more
complex data sets the risk assessor might also consider alternative
more complex models Zero-in1047298ated Poisson models or other variants
might be relevant in the case of count data with many nondetects
eg (Ridout et al 1998 Gonzales-Barron et al 2010) In case of more
dispersed data mixture models could be 1047297tted as well see for
example Creacutepet et al (2007) However it should be taken into
consideration that a mixture model should not be used to model the
heterogeneity of the speci1047297c data set (for example when food samples
were taken from 2 food business operators with each a different
contamination pro1047297le) when it is intended to be representative for a
more general situation (for example representing food samples of all
food business operators) Other studies have implemented hurdle
models with separate distributions for prevalence (presenceabsence)
and concentration (CFUg) of pathogens in food samples (see eg
Pouillot et al 2007 Peacuterez-Rodriacuteguez et al 2007) In this researchconcentration (CFUg) is modeled by a parametric distribution as
a hypothetical property of food samples for example a Poisson dis-
tribution with the concentration times serving size as a rate para-
meter could be implemented afterwards to model the contamination
in a speci1047297c serving as a non-negative integer
In the case of quantitative results the likelihood of obtaining these
results given a set of parameters θ is obtained by multiplication of all
values obtained by the probability density function p (middot) corresponding
to each data point xi given those parameters
leth X = f x1⋯ xngjθTHORN = prodn
i =1
peth xi jθTHORN eth1THORN
In the case of censored data the likelihood is given by multiplica-tion of areas under the probability density function instead of single
values hence the cumulative distribution function is used (Cox and
Oakes 1984) as is depicted in Fig 1 For computational convenience
logarithmic values of the likelihood are considered
The normal distribution has been chosen for the case studies
because it is generally assumed in food microbiology that contami-
nation is approximately log-normally distributed (Creacutepet et al 2007
Kilsby and Pugh 1981 Legan et al 2001) Ridout et al (1998) also
proposed the Poisson distribution and a Poisson distribution with a
Gamma distributed rate parameter λ however it was shown that the
lognormal distribution 1047297tted equally well to the data
Because these parameters are based on inference from measure-
ments of food samples which represent only a subset of the whole
population uncertainty about these parameters due to the limiteddata set is considered by using the bootstrapping technique (Efron
1982) In the parametric bootstrap method a distribution is 1047297t to a
data set Based on this distribution B new samples (of the same
sample size as the original data set) are sampled from this single
distribution For each of the B new samples the parameter of interest
is estimated and the distribution of all B estimated parameters
represents uncertainty about the estimate The empirical bootstrap
method is based on a similar procedure but instead of drawing B
samples from one1047297tted distribution theoriginaldata set is resampled
with replacement B times and a distribution is 1047297tted to each of the B
new samples
To include censored data points the empirical bootstrap approach is
chosen(as wasdone forexample by Zhaoand Frey 2004)Foranumberof
B=1000 iterations a bootstrap sample (ie sampling with replacement)
Fig 1 Illustration of the likelihood of quantitative data (left) versus interval-censored
data (right)
261P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 310
is taken from theoriginal censored data setof measurements Parameters
for that bootstrap sample are determined using MLE and quantiles of the
obtained distribution are stored in order to express a con1047297dence interval
later on
The joint probability distribution obtained by the bootstrap
samples is used to express uncertainty about the parameters due to
the limited number of samples Whenever the original data set is
available to the risk assessor it is possible for the risk assessor to
generate bootstrap samples so that obtained values can be useddirectly as input values of a two-dimensional Monte Carlo simulation
However this is not always the case for example in the case of large
data sets Therefore to enable communication of parameter uncer-
tainty a parametric distribution that 1047297ts best to the bootstrap
estimates is chosen for each parameter based on visual comparison
with the kernel density plots quantile-quantile plots and the χ 2 and
Anderson-Darling goodness-of-1047297t tests Distributions that could
generate implausible samples (such as negative values for a standard
deviation) should be avoided or truncated In order to examine the
correlation between parameters among bootstrap samples scatter
plots the Pearson correlation coef 1047297cient and Spearman rank order
correlation coef 1047297cient are evaluated
To be able to visually represent the combined variability and
uncertainty the 95 con1047297dence intervals of all individual quantiles of
the variability distribution are expressed Among different bootstrap
iterations thevalue of each quantile of the variability distribution will
vary For all quantiles of the variability distribution an uncertainty
distribution is obtained of which the 25 50 and 975 percentile (over
all bootstrap iterations) are calculated For all cumulative density
plots that will follow the reading key is as follows The median value
of all quantiles of the variability distribution is indicated with a black
line the interval between the 25 and 975 percentile is indicated with
a grey area A large grey area in horizontal direction indicates large
uncertainty about the value of the respective quantiles of the variability
distribution ie the values of the quantiles differ considerably among
different bootstrap iterations A small grey area indicates that the value
of each quantile does not alter much in between different bootstrap
iterations hence uncertainty is small The degree of variability is indi-
cated by the interval the median line spans from the lower quantiles tothe higher quantiles
Thecode forthesesimulationsis written in R (R Development Core
Team 2009) using functions from the survival package for MLE
procedures and can be obtained from the authors upon request
Although not used in the present study at the time of writing a
package1047297tdistrplus appeared which canbe used for thesame purpose
(Delignette-Muller et al 2008)
22 Data sets
For illustrative purposes a set of left-censored data is pseudo-
randomly generated in silico in order to simulate quantitative
measurements with a single LOQ below which no values can be
measured To begin with one random sample of 100 data points ispseudorandomly generated from a normal distribution with mean
μ =0 log10 CFUg and standard deviation σ =2 log10 CFUg An LOQ
is chosen at the 40th percentile of the normal distribution and the
data set is censored so that approximately 40 of the data set will fall
below that LOQ and hence will be regarded as nondetects For this
speci1047297c data set 40 out of 100 data points are regarded as below the
LOQ after censoring
Instead of determining the number of CFU of a pathogen per gram
of a food sample a portion of 25 g for example could be examined for
the mere presence of a micro-organism A negative test result
indicates a concentration of less than (a hypothetical) 004 cells per
gram of that food sample If the test result is positivea smaller portion
of that same laboratory sample stored at a temperature not allowing
for growth eg a portion of 1 g could be examined again for
presenceabsence If a homogeneous distribution of cells among test
portions is assumed semi-quantitative results are obtained this way
A positive 25 g portion and a negative 1 g portion would indicate for
example a bacterial concentration of between 004 and 1 CFUg This
outcome is said to be interval-censored
A similar situation is simulated in the second illustration The same
original ndash uncensoredndash data set of 100pseudorandom values generated
from a normal distribution with mean μ =0 log10 CFUg and standard
deviation σ =2 log10 CFUg is used for this second illustration Twodetection limits are chosenso that the1047297rst one (LOD1) issituated at the
60th percentile of the distribution and the second one (LOD2) at the
80th percentile Subsequently the data are transformed into purely
semiquantitative data ie each data point is reduced to either smaller
than the 1047297rst LOD between the 1047297rst and second LOD or greater than the
second LOD
As a 1047297rst case study based on real microbiological measurements
laboratory analyses of Campylobacter in chicken meat preparations at
the Belgian retail market are analyzed (Habib et al 2008a) The data
set consists of direct plating results using the ISO (2006) standard
method with a reduced limit of quanti1047297cation By plating one milliliter
of the primary 10-fold diluted suspension of the chicken meat sample
on three modi1047297ed cefoperazone charcoal deoxycholate agar (mCCDA
Oxoid Basingstoke England UK) spread plates of 90 mm diameter a
limit of quanti1047297cation of 10 CFUg is obtained instead of the usual LOQ
of 100 CFUg In 387 out of 656 measurements (59) the result is left-
censored due to the LOQ
This1047297rst casestudy isalsoused toexamine the in1047298uence of a number
of conditions In order to check the effect of using a reduced LOQ
all values below 100 CFUg (the standard LOQ of the Campylobacter
enumeration method) were assumed to be censored and hence
regarded as nondetects To determine if measurement error (assumed
to be plusmn05 log10 units) (Habib et al 2008b) would have a signi1047297cant
impact on the resulting distributions another simulation is run with all
quantitative data points xi replaced by the interval [ ximinus05 xi+05]
log10 CFUg in other words xi has not its quantitative value anymore but
is assumed to be interval censored By doing this it is stated that dueto
measurement error the real value of measurement xi is known only to
be within the interval [ ximinus05 xi+05] Finally to illustrate whattheimpact of the size of thedata set is the simulation is also conducted
with only half the number of data points For that purpose 328 data
points are pseudorandomly sampled with replacement from the
original data set
A set of 103 laboratory samples of smoked1047297sh on the Belgian retail
marketis used asa secondcasestudy(Table 1) Thelaboratory samples
were analyzed in the period 2005-2007 for a number of food business
operators (Uyttendaele et al 2009) In most cases a test portion of
25 g is analyzed qualitatively for the presence of L monocytogenes
according to the AFNOR validated VIDAS LMO method (Bio-129-07
02) In case of a positive test result either a smaller test portion of the
samelaboratory sample is analyzed(eg presenceabsencetesting per
001 g) or a test portion of the same laboratory sample is enumerated
Table 1
Overview of contamination data of L monocytogenes in smoked 1047297sh on the Belgian
retail market in the period 2005-2007
Number of samples Concentration (CFUg)
54 lt004
2 lt100
26 004minus10
1 15
8 004minus100
2 gt100
1 lt1
1 gt1
7 004minus1
1 1-100
262 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 410
by plating on ALOA (Fiers Belgium) according to ISO (1998Amd 1
2004) standard method using a reduced limit of quanti1047297cation (thus
LOQ 10 CFUg) In case of execution of quantitative tests most of the
outcomes are below this LOQ which again results in semiquantitative
data For a number of samples other sample weights were tested
according to the demands of the food business operators hence
resulting in a rather complex data set
3 Results and discussion of the in silico generated data
31 Illustration 1 left-censored quantitative data
A data set is in silico generated and left-censored representing
quantitative measurements in log10 CFUg A total of 100 data points
are pseudorandomly drawn from a normal distribution with mean μ
equal to 0 and standard deviationσ equal to 2 (see Fig 2) The(lower)
LOQ is set to -0507 log10 CFUg which is the 40th percentile of a
cumulative normal distribution function with μ =0 and σ =2 After
censoring 40 out of 100 data points are regarded as below the LOQ
Using MLE a normal distribution is 1047297tted to these censored data
with 1047297tted mean μ = -005 log10 CFUg and 1047297tted standard deviation
σ = 188 log10 CFUg The sample mean of the data set when all
nondetects would have been substituted with the LOQ is 049 log10
CFUg or 029 log10 CFUg when nondetects would have been
substituted by half of the (logarithmic) LOQ (see Table 2) For
comparison the mean of the original sampled data ndash before any
censoring algorithm is applied and hence the true sample mean ndash is
005 log10 CFUg This illustrates the major bias that originates when
alternative practices such as substition of nondetects by the LOQ are
applied to data sets As is illustrated by Lorimer and Kiermeier (2007)
this has been the case frequently in past research If nondetects would
have been ignored the estimated mean is 115 log10 CFUg This
approach is often intended to model only the positive subpopulation
along with an additional variable for prevalence however this
implies that the sensitivity of the detection method is ignored ndash eg
when a counting method has an LOQ of 10 CFUg ndash and that these
results cannot be transferredto other portion sizes than themeasured
one The true sample standard deviation is 177 log10 CFUg Thevaluesof the standard deviations are estimated to be lower in the case
of substitution methods because these methods push low data points
(nondetects) towards the center of the sample distribution
Subsequently non-parametric bootstrapping is applied to estimate
theuncertainty distributions for themean and standard deviation of the
estimated concentration The original censored data set is resampled
with replacement for B =1000 bootstrap iterations and each time a
normal distribution is 1047297tted to the bootstrap sample using MLE From
these 1047297tted distributions quantiles are calculated and stored and the
95 con1047297dence interval for each quantile is subsequently determined
from the variation within bootstraps The result is plotted in Fig 3a
Distributions are 1047297tted to the bootstrap statistics to estimate
hyperparameters The bootstrap means μ are described by a normal
distribution with mean μ μ =minus005 log10 CFUg and standard
deviation σ μ = 021 log10 CFUg The standard deviation of the datasample is represented by a gamma distribution with shape parameter
α σ =994 and scale parameter β σ =187middot10minus2 (see Fig 3c and d)
As opposed to the next illustration the bootstrap means and
standard deviations ndash belonging to a two-dimensional space ndash show no
obvious correlation (see Fig 3b) The Pearson correlation coef 1047297cient
between both equals -0319 and the Spearman rank order correlation
coef 1047297cient -0302
32 Illustration 2 semiquantitative data
The same original data set of Illustration 1 generated from a
normal distribution with mean μ equal to 0 and standard deviation σ
equal to 2 is transformed to represent semiquantitative measure-
ments in log10 CFUg The 1047297rst limit of detection LOD1 is set to 0507
log10 CFUg and the second limit of detection LOD2=168 log10 CFU
g In this data set 64 data points fall below the 1047297rst LOD and hence
would be noted as negative test results Eighteen data points fall
above the second LOD ie 18 laboratory samples would be positive
for both the 1047297rst and the second presenceabsence test All other data
points fall between the two limits of detection and hence represent
laboratory samples of which the1047297rst presenceabsence test is positive
and the second one negative This is visualized in Fig 4
Using MLE the censored data set was 1047297tted to a normal
distribution with mean μ =-025 log10 CFUg and standard deviation
σ =211 log10 CFUg For comparison the mean and standard
deviation of the original data set that is before the censoring
algorithm is applied to it are respectively 005 log10 CFUg and 177
log10 CFUg The 1047297tted distribution (after censoring) resembles the
original sample distribution (before censoring) remarkably wellespecially considering the fact that the information is reduced rather
drastically as opposed to purely quantitative measurements The fact
that random data were initially generated from a normal distribution
which corresponds to the distribution assumed for maximum
likelihood estimation naturally contributes to the good results
Bootstrap samples are subsequently generated to determine the
uncertainty of the parameters of the distribution The 95 con1047297dence
interval is plotted in Fig 5a As can be seen uncertainty increases for
the lower concentrations which can be explained by the fact that
approximately 60 of measurements is treated as undetected without
further information The meansof thebootstrap samples are1047297ttedtoa
normal distribution with mean μ μ = -030 log10 CFUg and standard
deviation σ μ = 042 log10 CFUg For comparison in the uncensored
case the standard error about the mean would be estimated to be177=
ffiffiffiffiffiffiffiffiffi
100p
= 0177 according the the central limit theorem This
illustrates the increase of uncertainty due to censoring The standard
deviation is 1047297tted to a gamma distribution with shape parameter
α σ =186 and scale parameter β σ =117middot10minus1
The mean and standard deviation belong to a two-dimensional
parameter space and are generally not to be considered as
independent If a scatterplot of the 1047297tted mean versus the 1047297tted
standard deviation of each bootstrap sample is examined it is clear
that a correlation has risen between both due to the large amount of
values censored below the 1047297rst LOD For this particular illustration
this situation can be explained as follows When many data points
below the lower limit of detection are selected in a bootstrap sample
the 1047297tted bootstrap mean is lower and the bootstrap standard
deviation in contrary is estimated higher which induces a (negative)
Fig 2 Histogram of the in silico generated sample data points of the 1047297rst illustration
with the vertical line indicating the lower LOD under beneath which all data points are
to be censored
263P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 510
correlation between both parameters This is depicted in Fig 5b The
Pearson correlation coef 1047297cient between them equals -0733 and
the Spearman rank order correlation coef 1047297cient -0677 In case the
original data set is available to the risk assessor the results of the
bootstrap method can be used directly as input values of a two-
dimensional MonteCarlo simulation However when only distributions
representing parameter uncertainty are communicated (for example in
the case of data sets too large to include in a report) it is better not to
draw random samples for the mean and standard deviation indepen-
dently one from another in a 2D Monte Carlo simulation for risk
Table 2
Overview for all illustrations and case studies of the results of the maximum likelihood estimation and of the results of mean and standard deviation calculated with substitution
methods (all units in log10 CFUg)
Data set True
distribution
Sample
parameters
MLE Substitution of nondetects
Fitted distribution Substitution by 1
2 LOD Sub st itu tion by LOD I gnor ing nondetect s
Illustration 1 μ =0 x = 005 micro =ndash005 micro = 029 micro = 049 micro =115
quantitative data σ =2 s =177 σ=188 σ=144 σ=127 σ=125
Illustration 2 μ =0 x = 005 micro =ndash025 - - -
semi-quantitative data σ =2 s =177 σ=211
Case study 1 unknown unknown micro = 073 micro = 110 micro = 128 micro =168
Campylobacter data σ=103 σ=064 σ=053 σ=065
Case study 1a unknown unknown micro = 046 micro = 116 micro = 206 micro =258
increased LOD σ=122 σ=051 σ=024 σ=050
Case study 1b unknown unknown micro = 072 - - -
measurement error σ=099
Case study 1c unknown unknown micro = 063 micro = 107 micro = 126 micro =169
reduced data set σ=107 σ=062 σ=051 σ=064
Case study 2 unknown unknown micro =ndash158 - - -
L monocytogenes data σ=154
Fig 3 Illustration 1 (a) plot of the 95 con1047297dence interval of the log-normal distribution 1047297tted to the set of left-censored quantitative measurements (b) scatterplot of the bootstrap
sample means versus standard deviations and kernel density plot of respectively the means (c) and standard deviations (d) of the bootstrapsamples (grey) with the 1047297tted normal and
gamma distributions plotted on top of it (black)
264 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 610
assessment because this would incorrectly in1047298uence the representation
of uncertainty intervalsfor thelowerpercentilesof the1047297nal distribution
A number of solutions exist to this issue One could for example use
copulas or apply the Iman-Conover method for correlated sampling
from two distributions (Haas et al 1999 Iman and Conover 1982)
Here as an alternative solution the standard deviation is modelled as a
linear function of the mean with addition of an error term similarly as
was done by Calistri and Giovannini (2008) Based on the scatterplot it
is found reasonable to assumethat themean and standard deviation are
related approximately linearly and the error term remains of the same
magnitude overthe range of the mean For other case studies nonlinear
regression could be applied as well
The standard deviation is formulated as a linear function of the
mean
σ = β0 + β1sdotμ +
eth2
THORNwith β 0 and β 1 being respectively the intercept and slope of the linear
relation and an error term In this case following equation is ob-
tained by performing linear regression followed by assessment of the
residual values
σ = 191minus0888sdotμ + Normethμ = 0σ = 0344THORN eth3THORN
For comparison the results obtained by bootstrapping and the
results obtained by this linear model are plotted on top of each other
see Fig 5c
4 Results for real food product data
41 Case study 1 Campylobacter in chicken meat preparations
In this 1047297rst case study the results of Campylobacter analyses in
chicken meat preparations are evaluated The data set comprises
quantitative analysis results with an LOQ of 10 CFUg In 387 of the
656 measurements (59) the result is left-censored due to the LOQ
Using MLE the logarithms of the censored data have been 1047297tted to a
normal distribution with mean μ =073 log10 CFUg and standard
deviation σ =103 log10 CFUg
For comparison if the nondetects would have been substituted by
half of the LOQ the 1047297tted distribution would have been a normal
distribution with mean μ =110 log10 CFUg and standard deviation
σ =064 log10 CFUg (see Table 2)
After bootstrapping the mean and standard deviation are repre-
sented by a normal distribution and a gamma distribution respectively
The means of the bootstrap samples are 1047297tted to a normal distribution
with hyperparameters mean μ μ =073 log10 CFUg and standard
deviation σ μ =006 log10 CFUg The standard deviation is 1047297tted by a
gamma distribution with shape parameter α σ =304 and scale param-
eter β σ =339middot10minus3
The resulting distribution with its 95 con1047297dence interval is
shown in Fig 6a
As can be seen uncertainty about the distribution parameters is
rather small compared to variability as could be expected due to thelarge data set and the many remaining quantitative non-censored
data points
The Campylobacter data set is also used to test the in1047298uence of a
number of factors Firstly to check the in1047298uence of the limit of
quanti1047297cation all data points of the data set are censored to an
increased LOQ of 100 CFUg (standard LOQ of the Campylobacter
enumeration method) instead of 10 CFUg (reduced limit of quanti-1047297cation obtained by plating one milliliter over three mCCDA plates)
In this new data set 589 out of 656 values ie 90 (as opposed to 59
in the original data set) are censored Using maximum likelihood the
new estimated mean and standard deviation are 046 and 122 log10
CFUg The resulting distribution after bootstrapping is shown in
Fig 6b As can be seen this increased LOQ has a high in1047298uence on
parameter estimates as well as on uncertainty Despite the speci1047297city
of this particular case study it illustrates (in an opposite way) the
important impact a reduction of the limit of quanti1047297cation of current
detection methods (and thus an increase of non-censored values)
(eg Gnanou Besse et al 2004) might have on the obtained results
when data sets include a signi1047297cant amount of nondetects
It is also tested what the effect would be if the measurement
error would be included at a realistic level corresponding to routine
laboratory measurements A measurement error of 05 log10 CFUg is
superimposed on all original quantitative measurements thus replacing
all quantitative data points xi with an interval [ ximinus05 xi+05] log10
CFUg The newly obtained estimations of mean and standard deviation
are respectively 072 and 099 log10 CFUg Implementing measurement
error appears to have very little impact on the obtained result for this
data set as can be seen in Fig 6c
To illustrate theimpact of the number of data points on the obtaineddistribution a distribution is 1047297tted to half the number of data points
328 data points are randomly sampled from the original data set of 656
data points and subjected to MLE and bootstrapping The estimated
mean andstandard deviation are respectively 063 and107 log10 CFUg
The resulting distribution is depicted in Fig 6d Although uncertainty
intervals do increase somewhat the deviation of the new distribution
compared to the originally obtained distribution (Fig 6d) remains
rather limited despite the fact that the number of data points has been
reduced drastically This indicates that the investment of labor and costs
in a large number of additional measurements might not always have
the expected impact on the resulting output distribution
The results of all of these test cases are summarized in Table 2
Similarly as in the previous illustrations the bias that arises when
substitution methods are used or if nondetects are ignored can be seen
42 Case study 2 Listeria monocytogenes in smoked 1047297sh samples
The second case study consists of 103 measurements of Listeria
monocytogenes in smoked1047297sh samples As opposed to the Campylobacter
case study this data set contains merely 1 quantitative measurement
(1 laboratory sample enumerated L monocytogenes gt 10 CFUg ie
15 CFUg) All other measurements are either interval- left- or right-
censored Moreover the data set contains several different LODs
depending on the demands of the food business operator the particular
food samples were supplied by
Using MLE the logarithmic values of the analysis results are1047297tted to
a normal distribution with mean μ =minus158 log10 CFUg and standard
deviation σ =154 log10 CFUg Based on the empirical distributions
Fig 4 Histogramof the pseudorandomdata points usedin Illustration2 withthe vertical
lines indicating the limits of detection of the 1047297rst and second presenceabsence test
265P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 710
of the bootstrap estimates of the distribution parameters the normal
distribution is chosen to 1047297t both the mean and the standard deviation
The mean is 1047297tted to a normal distribution with hyperparameters
mean μ μ =minus158 log10 CFUg and standard deviation σ u=020 log10CFUg The standard deviation is 1047297tted to a normal distribution with
hyperparameters mean μ σ =151 log10 CFUg and standard deviation
σ σ =028 log10 CFUg To avoid sampling of negative values for the
standard deviation this distribution is truncated at zero
As can be seen in the scatterplot of the bootstrap means versus
the bootstrap standard deviations (see Fig 7a) a small number of
bootstrap samples at the lower part of the 1047297gure deviate from the
majority of the samples This deviation is caused by the absence in therespective bootstrap samples of a number of rare intervals from the
original data set These intervals all have concentration values higher
than the general mean value and their inclusionleads to an increase of
the standard deviation This separation between a small cloud of
points with low standard deviation and a big cloud with the majority
of the points is hence purely a consequence of the complexity of
this particular data set but illustrates the limitations of the non-
parametric bootstrap method When a data set has relatively few
distinct values (in this particular case 10 distinct values are present)
differences can be great between bootstrap samples This should
always be checked for when applying non-parametric bootstrap This
problem does not occur when the parametric bootstrap method is
applied however applying the parametric bootstrap method to
censored data would incorrectly result in different uncertainty intervals
if compared to non-parametric bootstrappingwhichcould lead to a fail-
dangerous underestimation a number of test simulations have
con1047297rmed this (results not shown) The parametric bootstrap however
could be applied by generating bootstrap samples from a parametric
distribution and censoring them manually in the speci1047297c case that the
complete data set has to be compared to one LOQ only (Zhao and Frey
2004)
Theresulting distributionwith its95 con1047297dence interval is shown
in Fig 7b
5 Discussion for real food product data
The examples presented in this article illustrate how complex data
sets including nondetects semiquantitative and qualitative measure-
ments can be interpreted in an appropriate way for use in microbio-logical risk assessment Ignoring nondetects or substituting them with
the LODLOQ or half of it is a classical source of bias (cf Lorimer and
Kiermeier 2007) that canand should be avoidedusingthesemethods It
has been demonstrated in this paper that even complex data sets
including either very diverse analyses or large amounts of censored
values can lead to very satisfying outcomes Nevertheless attention
must be paid to the possibilities and limitations of these methods
Blindly1047297tting a dataset with limited information (for example a data set
consisting of purely presenceabsence tests as obtained if analyses are
performed for compliance testing to a set legal criterion) to a speci1047297c
distribution might result in unrealistic outcomes Moreover the limited
small sample properties of the non-parametric bootstrap method must
be taken into account as is illustrated in the second case study This
supports the recommendation to set up dedicated baseline surveys fordata gathering to be used for risk assessment
The illustration of the various case studies (either with hypothet-
ical data sets and with real-life microbiological data sets from
dedicated or ad hoc combined surveys) shows that it is important in
establishment of microbiological baseline surveys to apply some
semi-quantitative methodology and by preference a methodology
enabling an estimation of numbers present in the positive laboratory
samples For example Straver et al (2007) estimated contamination
of chicken breast 1047297let with Salmonella using a combination of prior
enrichment of pooled laboratory samples and subsequent enumeration
of Salmonella in positive laboratory samples using a Most Probable
Number (MPN) assay The use of MPN methods or enumeration
methods (as in the Camplyobacter case study) with reduced limit of
quanti1047297cation which overall providerather an estimate of thenumber of
Fig 5 Illustration 2 (a) plot of the 95 con1047297dence interval of the 1047297tted log-normal
distribution for the set of semiquantitative measurements (b) scatter plot of the means
versus the standard deviations of the bootstrap samples and (c) comparison of the
parametersobtained by bootstrapping () andrandomly generatedparameters using a
linear relationship ()
266 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 810
pathogens present instead of an exact value is in the frame of the
estimation of distributions of the contamination level not a crucial
factor In the present Campylobacter case study it was shown that
inclusion of the measurement error interval for quantitative analyses
hardly affects the estimated distributions
The proportion ofnondetects onthe otherhandmay have a signi1047297cant
impact on theresult as has been shown in the present Campylobacter case
study This illustrates the positive effect that lowering the limit of
quanti1047297cation of a certain analysis method might have on lowering theuncertainty of thedistributionwhen themicrobiologist is confrontedwith
a substantial amount of nondetects It was noticed for the Listeria case
study thatthe uncertaintyof the distribution is especiallyincreased at the
lower levels (lt001 CFUg) (Fig 7b) as in this concentration range only
left-censored data are available More information on the estimated level
of contamination would enable to decrease uncertainty However overall
the estimated mean level of contamination for Listeria (micro =ndash158 log10CFUg) is much lower than for Campylobacter (micro =073 log10 CFUg)
which means that having access to enumeration data at these very low
levels of contamination will take considerable laboratory effort require
adapted methodological procedures and thus related costs for obtaining
this type of data set
On the other hand it was shown that to obtain a good data set for
estimation of distributions of contamination levels it does not
necessarily demand a large study In the present Campylobacter case
study it was illustrated that increasing the number of analyses to a
large extent might lead to only a limited additional reduction of
uncertainty in the case of an already suf 1047297cient data set with rep-
resentative outcomes The distribution of the Campylobacter contam-
ination level shown in Fig 6d is based upon 122 enumeration results
(obtained from in total 328 laboratory samples analyzed) whereas in
Fig 6a 269 enumeration values were available (obtained from 656
laboratory samples analyzed) for the estimation of the distribution of contamination level
Setting up a baseline survey to acquire a data set to serve as the
basis for estimation of an input distribution for risk assessment thus
has to take into account appropriate methodology to provide a suf-1047297cient number of detects and estimates of numbers but also has
to provide results which are representative for the objective of the
risk assessment eg food product under consideration stage in the
production chain variability between producers seasonality etc in
order not to introduce bias in the distribution obtained As such
setting up a baseline survey is a complex exercise Nevertheless if
the data set is available appropriate techniques also need to be used
to translate the information from the data set into a distribution
In the present study an approach based upon maximum likelihood
estimation wasshownto provide good resultsto presentthe variationof
Fig 6 Case study 1 (a) plot of the 95 con1047297dence interval of the normal distribution 1047297tted to the Campylobacter contamination data (b) in1047298uence of an increased LOD on the
resultingdistribution Thedotted lines representthe original data set thefull lines representthe data seton which an increasedLOD hasbeen imposed (c)in1047298uence of inclusionof a
measurement error interval on the estimated distribution (full lines) compared to the original data set (dotted lines) (d) in 1047298uence of the number of data points comparison of the
results of a random subset with N 2 data points (full lines) with the original data set (dotted lines)
267P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 910
contamination data Additional information with regard to variability
and uncertainty could be extracted as well using the bootstrap method
The same methodology can equally be applied with more complex
models such as mixture models or Poisson-like models Alternative
methods such as Bayesian analysis can also be applied and lead to
similar outcomes (results not shown) Examples of a Bayesian analysis
can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al
(2007)
Application of these techniques offers a way for meta-analysis of the
many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by conse-
quence the correctness of the outcome of the risk assessment
Acknowledgements
This research is supported in part by the Research Council of the
Katholieke Universiteit Leuven (projects OT0925TBA and EF05006
Center-of-Excellence Optimization in Engineering) knowledge plat-
form KP09005 (SCORES4CHEM) of the Industrial Research Fund the
Belgian Program on Interuniversity Poles of Attraction initiated by the
Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c
Research-Flanders (FWO-Vlaanderen project G042409 N) J Van
Impe holds the chair Safety Engineering sponsored by the Belgian
chemistry and life sciences federation essencia Research is conducted
utilizing high performance computational resources provided by the
University of Leuven httpluditkuleuvenbe
We would like to thank the Ghent University cluster of the
Department of Veterinary Public Health and Food Safety Faculty
of Veterinary Medicine and Department of Food Safety and Food
Quality Faculty of Bio-Science Engineering who kindly provided the
Campylobacter dataderived from a Federal Public Health Service funded
project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food
Safety and Food Quality Faculty of Bio-Science Engineering Ghent
University is acknowledged for providing the data on the microbiolog-
ical analysis and challenge testing for L monocytogenes
References
Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287
Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616
Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall
Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258
Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg
Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38
FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml
FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml
Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127
Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a
relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277
Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74
Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA
Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439
Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334
ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method
ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method
Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73
Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354
Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147
Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318
Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009
Evaluation of the ldquotesting and schedulingrdquo
strategy for control of Campylobacter in
Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the
bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution
1047297tted to the logarithmic Listeria monocytogenes contamination data
268 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 310
is taken from theoriginal censored data setof measurements Parameters
for that bootstrap sample are determined using MLE and quantiles of the
obtained distribution are stored in order to express a con1047297dence interval
later on
The joint probability distribution obtained by the bootstrap
samples is used to express uncertainty about the parameters due to
the limited number of samples Whenever the original data set is
available to the risk assessor it is possible for the risk assessor to
generate bootstrap samples so that obtained values can be useddirectly as input values of a two-dimensional Monte Carlo simulation
However this is not always the case for example in the case of large
data sets Therefore to enable communication of parameter uncer-
tainty a parametric distribution that 1047297ts best to the bootstrap
estimates is chosen for each parameter based on visual comparison
with the kernel density plots quantile-quantile plots and the χ 2 and
Anderson-Darling goodness-of-1047297t tests Distributions that could
generate implausible samples (such as negative values for a standard
deviation) should be avoided or truncated In order to examine the
correlation between parameters among bootstrap samples scatter
plots the Pearson correlation coef 1047297cient and Spearman rank order
correlation coef 1047297cient are evaluated
To be able to visually represent the combined variability and
uncertainty the 95 con1047297dence intervals of all individual quantiles of
the variability distribution are expressed Among different bootstrap
iterations thevalue of each quantile of the variability distribution will
vary For all quantiles of the variability distribution an uncertainty
distribution is obtained of which the 25 50 and 975 percentile (over
all bootstrap iterations) are calculated For all cumulative density
plots that will follow the reading key is as follows The median value
of all quantiles of the variability distribution is indicated with a black
line the interval between the 25 and 975 percentile is indicated with
a grey area A large grey area in horizontal direction indicates large
uncertainty about the value of the respective quantiles of the variability
distribution ie the values of the quantiles differ considerably among
different bootstrap iterations A small grey area indicates that the value
of each quantile does not alter much in between different bootstrap
iterations hence uncertainty is small The degree of variability is indi-
cated by the interval the median line spans from the lower quantiles tothe higher quantiles
Thecode forthesesimulationsis written in R (R Development Core
Team 2009) using functions from the survival package for MLE
procedures and can be obtained from the authors upon request
Although not used in the present study at the time of writing a
package1047297tdistrplus appeared which canbe used for thesame purpose
(Delignette-Muller et al 2008)
22 Data sets
For illustrative purposes a set of left-censored data is pseudo-
randomly generated in silico in order to simulate quantitative
measurements with a single LOQ below which no values can be
measured To begin with one random sample of 100 data points ispseudorandomly generated from a normal distribution with mean
μ =0 log10 CFUg and standard deviation σ =2 log10 CFUg An LOQ
is chosen at the 40th percentile of the normal distribution and the
data set is censored so that approximately 40 of the data set will fall
below that LOQ and hence will be regarded as nondetects For this
speci1047297c data set 40 out of 100 data points are regarded as below the
LOQ after censoring
Instead of determining the number of CFU of a pathogen per gram
of a food sample a portion of 25 g for example could be examined for
the mere presence of a micro-organism A negative test result
indicates a concentration of less than (a hypothetical) 004 cells per
gram of that food sample If the test result is positivea smaller portion
of that same laboratory sample stored at a temperature not allowing
for growth eg a portion of 1 g could be examined again for
presenceabsence If a homogeneous distribution of cells among test
portions is assumed semi-quantitative results are obtained this way
A positive 25 g portion and a negative 1 g portion would indicate for
example a bacterial concentration of between 004 and 1 CFUg This
outcome is said to be interval-censored
A similar situation is simulated in the second illustration The same
original ndash uncensoredndash data set of 100pseudorandom values generated
from a normal distribution with mean μ =0 log10 CFUg and standard
deviation σ =2 log10 CFUg is used for this second illustration Twodetection limits are chosenso that the1047297rst one (LOD1) issituated at the
60th percentile of the distribution and the second one (LOD2) at the
80th percentile Subsequently the data are transformed into purely
semiquantitative data ie each data point is reduced to either smaller
than the 1047297rst LOD between the 1047297rst and second LOD or greater than the
second LOD
As a 1047297rst case study based on real microbiological measurements
laboratory analyses of Campylobacter in chicken meat preparations at
the Belgian retail market are analyzed (Habib et al 2008a) The data
set consists of direct plating results using the ISO (2006) standard
method with a reduced limit of quanti1047297cation By plating one milliliter
of the primary 10-fold diluted suspension of the chicken meat sample
on three modi1047297ed cefoperazone charcoal deoxycholate agar (mCCDA
Oxoid Basingstoke England UK) spread plates of 90 mm diameter a
limit of quanti1047297cation of 10 CFUg is obtained instead of the usual LOQ
of 100 CFUg In 387 out of 656 measurements (59) the result is left-
censored due to the LOQ
This1047297rst casestudy isalsoused toexamine the in1047298uence of a number
of conditions In order to check the effect of using a reduced LOQ
all values below 100 CFUg (the standard LOQ of the Campylobacter
enumeration method) were assumed to be censored and hence
regarded as nondetects To determine if measurement error (assumed
to be plusmn05 log10 units) (Habib et al 2008b) would have a signi1047297cant
impact on the resulting distributions another simulation is run with all
quantitative data points xi replaced by the interval [ ximinus05 xi+05]
log10 CFUg in other words xi has not its quantitative value anymore but
is assumed to be interval censored By doing this it is stated that dueto
measurement error the real value of measurement xi is known only to
be within the interval [ ximinus05 xi+05] Finally to illustrate whattheimpact of the size of thedata set is the simulation is also conducted
with only half the number of data points For that purpose 328 data
points are pseudorandomly sampled with replacement from the
original data set
A set of 103 laboratory samples of smoked1047297sh on the Belgian retail
marketis used asa secondcasestudy(Table 1) Thelaboratory samples
were analyzed in the period 2005-2007 for a number of food business
operators (Uyttendaele et al 2009) In most cases a test portion of
25 g is analyzed qualitatively for the presence of L monocytogenes
according to the AFNOR validated VIDAS LMO method (Bio-129-07
02) In case of a positive test result either a smaller test portion of the
samelaboratory sample is analyzed(eg presenceabsencetesting per
001 g) or a test portion of the same laboratory sample is enumerated
Table 1
Overview of contamination data of L monocytogenes in smoked 1047297sh on the Belgian
retail market in the period 2005-2007
Number of samples Concentration (CFUg)
54 lt004
2 lt100
26 004minus10
1 15
8 004minus100
2 gt100
1 lt1
1 gt1
7 004minus1
1 1-100
262 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 410
by plating on ALOA (Fiers Belgium) according to ISO (1998Amd 1
2004) standard method using a reduced limit of quanti1047297cation (thus
LOQ 10 CFUg) In case of execution of quantitative tests most of the
outcomes are below this LOQ which again results in semiquantitative
data For a number of samples other sample weights were tested
according to the demands of the food business operators hence
resulting in a rather complex data set
3 Results and discussion of the in silico generated data
31 Illustration 1 left-censored quantitative data
A data set is in silico generated and left-censored representing
quantitative measurements in log10 CFUg A total of 100 data points
are pseudorandomly drawn from a normal distribution with mean μ
equal to 0 and standard deviationσ equal to 2 (see Fig 2) The(lower)
LOQ is set to -0507 log10 CFUg which is the 40th percentile of a
cumulative normal distribution function with μ =0 and σ =2 After
censoring 40 out of 100 data points are regarded as below the LOQ
Using MLE a normal distribution is 1047297tted to these censored data
with 1047297tted mean μ = -005 log10 CFUg and 1047297tted standard deviation
σ = 188 log10 CFUg The sample mean of the data set when all
nondetects would have been substituted with the LOQ is 049 log10
CFUg or 029 log10 CFUg when nondetects would have been
substituted by half of the (logarithmic) LOQ (see Table 2) For
comparison the mean of the original sampled data ndash before any
censoring algorithm is applied and hence the true sample mean ndash is
005 log10 CFUg This illustrates the major bias that originates when
alternative practices such as substition of nondetects by the LOQ are
applied to data sets As is illustrated by Lorimer and Kiermeier (2007)
this has been the case frequently in past research If nondetects would
have been ignored the estimated mean is 115 log10 CFUg This
approach is often intended to model only the positive subpopulation
along with an additional variable for prevalence however this
implies that the sensitivity of the detection method is ignored ndash eg
when a counting method has an LOQ of 10 CFUg ndash and that these
results cannot be transferredto other portion sizes than themeasured
one The true sample standard deviation is 177 log10 CFUg Thevaluesof the standard deviations are estimated to be lower in the case
of substitution methods because these methods push low data points
(nondetects) towards the center of the sample distribution
Subsequently non-parametric bootstrapping is applied to estimate
theuncertainty distributions for themean and standard deviation of the
estimated concentration The original censored data set is resampled
with replacement for B =1000 bootstrap iterations and each time a
normal distribution is 1047297tted to the bootstrap sample using MLE From
these 1047297tted distributions quantiles are calculated and stored and the
95 con1047297dence interval for each quantile is subsequently determined
from the variation within bootstraps The result is plotted in Fig 3a
Distributions are 1047297tted to the bootstrap statistics to estimate
hyperparameters The bootstrap means μ are described by a normal
distribution with mean μ μ =minus005 log10 CFUg and standard
deviation σ μ = 021 log10 CFUg The standard deviation of the datasample is represented by a gamma distribution with shape parameter
α σ =994 and scale parameter β σ =187middot10minus2 (see Fig 3c and d)
As opposed to the next illustration the bootstrap means and
standard deviations ndash belonging to a two-dimensional space ndash show no
obvious correlation (see Fig 3b) The Pearson correlation coef 1047297cient
between both equals -0319 and the Spearman rank order correlation
coef 1047297cient -0302
32 Illustration 2 semiquantitative data
The same original data set of Illustration 1 generated from a
normal distribution with mean μ equal to 0 and standard deviation σ
equal to 2 is transformed to represent semiquantitative measure-
ments in log10 CFUg The 1047297rst limit of detection LOD1 is set to 0507
log10 CFUg and the second limit of detection LOD2=168 log10 CFU
g In this data set 64 data points fall below the 1047297rst LOD and hence
would be noted as negative test results Eighteen data points fall
above the second LOD ie 18 laboratory samples would be positive
for both the 1047297rst and the second presenceabsence test All other data
points fall between the two limits of detection and hence represent
laboratory samples of which the1047297rst presenceabsence test is positive
and the second one negative This is visualized in Fig 4
Using MLE the censored data set was 1047297tted to a normal
distribution with mean μ =-025 log10 CFUg and standard deviation
σ =211 log10 CFUg For comparison the mean and standard
deviation of the original data set that is before the censoring
algorithm is applied to it are respectively 005 log10 CFUg and 177
log10 CFUg The 1047297tted distribution (after censoring) resembles the
original sample distribution (before censoring) remarkably wellespecially considering the fact that the information is reduced rather
drastically as opposed to purely quantitative measurements The fact
that random data were initially generated from a normal distribution
which corresponds to the distribution assumed for maximum
likelihood estimation naturally contributes to the good results
Bootstrap samples are subsequently generated to determine the
uncertainty of the parameters of the distribution The 95 con1047297dence
interval is plotted in Fig 5a As can be seen uncertainty increases for
the lower concentrations which can be explained by the fact that
approximately 60 of measurements is treated as undetected without
further information The meansof thebootstrap samples are1047297ttedtoa
normal distribution with mean μ μ = -030 log10 CFUg and standard
deviation σ μ = 042 log10 CFUg For comparison in the uncensored
case the standard error about the mean would be estimated to be177=
ffiffiffiffiffiffiffiffiffi
100p
= 0177 according the the central limit theorem This
illustrates the increase of uncertainty due to censoring The standard
deviation is 1047297tted to a gamma distribution with shape parameter
α σ =186 and scale parameter β σ =117middot10minus1
The mean and standard deviation belong to a two-dimensional
parameter space and are generally not to be considered as
independent If a scatterplot of the 1047297tted mean versus the 1047297tted
standard deviation of each bootstrap sample is examined it is clear
that a correlation has risen between both due to the large amount of
values censored below the 1047297rst LOD For this particular illustration
this situation can be explained as follows When many data points
below the lower limit of detection are selected in a bootstrap sample
the 1047297tted bootstrap mean is lower and the bootstrap standard
deviation in contrary is estimated higher which induces a (negative)
Fig 2 Histogram of the in silico generated sample data points of the 1047297rst illustration
with the vertical line indicating the lower LOD under beneath which all data points are
to be censored
263P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 510
correlation between both parameters This is depicted in Fig 5b The
Pearson correlation coef 1047297cient between them equals -0733 and
the Spearman rank order correlation coef 1047297cient -0677 In case the
original data set is available to the risk assessor the results of the
bootstrap method can be used directly as input values of a two-
dimensional MonteCarlo simulation However when only distributions
representing parameter uncertainty are communicated (for example in
the case of data sets too large to include in a report) it is better not to
draw random samples for the mean and standard deviation indepen-
dently one from another in a 2D Monte Carlo simulation for risk
Table 2
Overview for all illustrations and case studies of the results of the maximum likelihood estimation and of the results of mean and standard deviation calculated with substitution
methods (all units in log10 CFUg)
Data set True
distribution
Sample
parameters
MLE Substitution of nondetects
Fitted distribution Substitution by 1
2 LOD Sub st itu tion by LOD I gnor ing nondetect s
Illustration 1 μ =0 x = 005 micro =ndash005 micro = 029 micro = 049 micro =115
quantitative data σ =2 s =177 σ=188 σ=144 σ=127 σ=125
Illustration 2 μ =0 x = 005 micro =ndash025 - - -
semi-quantitative data σ =2 s =177 σ=211
Case study 1 unknown unknown micro = 073 micro = 110 micro = 128 micro =168
Campylobacter data σ=103 σ=064 σ=053 σ=065
Case study 1a unknown unknown micro = 046 micro = 116 micro = 206 micro =258
increased LOD σ=122 σ=051 σ=024 σ=050
Case study 1b unknown unknown micro = 072 - - -
measurement error σ=099
Case study 1c unknown unknown micro = 063 micro = 107 micro = 126 micro =169
reduced data set σ=107 σ=062 σ=051 σ=064
Case study 2 unknown unknown micro =ndash158 - - -
L monocytogenes data σ=154
Fig 3 Illustration 1 (a) plot of the 95 con1047297dence interval of the log-normal distribution 1047297tted to the set of left-censored quantitative measurements (b) scatterplot of the bootstrap
sample means versus standard deviations and kernel density plot of respectively the means (c) and standard deviations (d) of the bootstrapsamples (grey) with the 1047297tted normal and
gamma distributions plotted on top of it (black)
264 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 610
assessment because this would incorrectly in1047298uence the representation
of uncertainty intervalsfor thelowerpercentilesof the1047297nal distribution
A number of solutions exist to this issue One could for example use
copulas or apply the Iman-Conover method for correlated sampling
from two distributions (Haas et al 1999 Iman and Conover 1982)
Here as an alternative solution the standard deviation is modelled as a
linear function of the mean with addition of an error term similarly as
was done by Calistri and Giovannini (2008) Based on the scatterplot it
is found reasonable to assumethat themean and standard deviation are
related approximately linearly and the error term remains of the same
magnitude overthe range of the mean For other case studies nonlinear
regression could be applied as well
The standard deviation is formulated as a linear function of the
mean
σ = β0 + β1sdotμ +
eth2
THORNwith β 0 and β 1 being respectively the intercept and slope of the linear
relation and an error term In this case following equation is ob-
tained by performing linear regression followed by assessment of the
residual values
σ = 191minus0888sdotμ + Normethμ = 0σ = 0344THORN eth3THORN
For comparison the results obtained by bootstrapping and the
results obtained by this linear model are plotted on top of each other
see Fig 5c
4 Results for real food product data
41 Case study 1 Campylobacter in chicken meat preparations
In this 1047297rst case study the results of Campylobacter analyses in
chicken meat preparations are evaluated The data set comprises
quantitative analysis results with an LOQ of 10 CFUg In 387 of the
656 measurements (59) the result is left-censored due to the LOQ
Using MLE the logarithms of the censored data have been 1047297tted to a
normal distribution with mean μ =073 log10 CFUg and standard
deviation σ =103 log10 CFUg
For comparison if the nondetects would have been substituted by
half of the LOQ the 1047297tted distribution would have been a normal
distribution with mean μ =110 log10 CFUg and standard deviation
σ =064 log10 CFUg (see Table 2)
After bootstrapping the mean and standard deviation are repre-
sented by a normal distribution and a gamma distribution respectively
The means of the bootstrap samples are 1047297tted to a normal distribution
with hyperparameters mean μ μ =073 log10 CFUg and standard
deviation σ μ =006 log10 CFUg The standard deviation is 1047297tted by a
gamma distribution with shape parameter α σ =304 and scale param-
eter β σ =339middot10minus3
The resulting distribution with its 95 con1047297dence interval is
shown in Fig 6a
As can be seen uncertainty about the distribution parameters is
rather small compared to variability as could be expected due to thelarge data set and the many remaining quantitative non-censored
data points
The Campylobacter data set is also used to test the in1047298uence of a
number of factors Firstly to check the in1047298uence of the limit of
quanti1047297cation all data points of the data set are censored to an
increased LOQ of 100 CFUg (standard LOQ of the Campylobacter
enumeration method) instead of 10 CFUg (reduced limit of quanti-1047297cation obtained by plating one milliliter over three mCCDA plates)
In this new data set 589 out of 656 values ie 90 (as opposed to 59
in the original data set) are censored Using maximum likelihood the
new estimated mean and standard deviation are 046 and 122 log10
CFUg The resulting distribution after bootstrapping is shown in
Fig 6b As can be seen this increased LOQ has a high in1047298uence on
parameter estimates as well as on uncertainty Despite the speci1047297city
of this particular case study it illustrates (in an opposite way) the
important impact a reduction of the limit of quanti1047297cation of current
detection methods (and thus an increase of non-censored values)
(eg Gnanou Besse et al 2004) might have on the obtained results
when data sets include a signi1047297cant amount of nondetects
It is also tested what the effect would be if the measurement
error would be included at a realistic level corresponding to routine
laboratory measurements A measurement error of 05 log10 CFUg is
superimposed on all original quantitative measurements thus replacing
all quantitative data points xi with an interval [ ximinus05 xi+05] log10
CFUg The newly obtained estimations of mean and standard deviation
are respectively 072 and 099 log10 CFUg Implementing measurement
error appears to have very little impact on the obtained result for this
data set as can be seen in Fig 6c
To illustrate theimpact of the number of data points on the obtaineddistribution a distribution is 1047297tted to half the number of data points
328 data points are randomly sampled from the original data set of 656
data points and subjected to MLE and bootstrapping The estimated
mean andstandard deviation are respectively 063 and107 log10 CFUg
The resulting distribution is depicted in Fig 6d Although uncertainty
intervals do increase somewhat the deviation of the new distribution
compared to the originally obtained distribution (Fig 6d) remains
rather limited despite the fact that the number of data points has been
reduced drastically This indicates that the investment of labor and costs
in a large number of additional measurements might not always have
the expected impact on the resulting output distribution
The results of all of these test cases are summarized in Table 2
Similarly as in the previous illustrations the bias that arises when
substitution methods are used or if nondetects are ignored can be seen
42 Case study 2 Listeria monocytogenes in smoked 1047297sh samples
The second case study consists of 103 measurements of Listeria
monocytogenes in smoked1047297sh samples As opposed to the Campylobacter
case study this data set contains merely 1 quantitative measurement
(1 laboratory sample enumerated L monocytogenes gt 10 CFUg ie
15 CFUg) All other measurements are either interval- left- or right-
censored Moreover the data set contains several different LODs
depending on the demands of the food business operator the particular
food samples were supplied by
Using MLE the logarithmic values of the analysis results are1047297tted to
a normal distribution with mean μ =minus158 log10 CFUg and standard
deviation σ =154 log10 CFUg Based on the empirical distributions
Fig 4 Histogramof the pseudorandomdata points usedin Illustration2 withthe vertical
lines indicating the limits of detection of the 1047297rst and second presenceabsence test
265P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 710
of the bootstrap estimates of the distribution parameters the normal
distribution is chosen to 1047297t both the mean and the standard deviation
The mean is 1047297tted to a normal distribution with hyperparameters
mean μ μ =minus158 log10 CFUg and standard deviation σ u=020 log10CFUg The standard deviation is 1047297tted to a normal distribution with
hyperparameters mean μ σ =151 log10 CFUg and standard deviation
σ σ =028 log10 CFUg To avoid sampling of negative values for the
standard deviation this distribution is truncated at zero
As can be seen in the scatterplot of the bootstrap means versus
the bootstrap standard deviations (see Fig 7a) a small number of
bootstrap samples at the lower part of the 1047297gure deviate from the
majority of the samples This deviation is caused by the absence in therespective bootstrap samples of a number of rare intervals from the
original data set These intervals all have concentration values higher
than the general mean value and their inclusionleads to an increase of
the standard deviation This separation between a small cloud of
points with low standard deviation and a big cloud with the majority
of the points is hence purely a consequence of the complexity of
this particular data set but illustrates the limitations of the non-
parametric bootstrap method When a data set has relatively few
distinct values (in this particular case 10 distinct values are present)
differences can be great between bootstrap samples This should
always be checked for when applying non-parametric bootstrap This
problem does not occur when the parametric bootstrap method is
applied however applying the parametric bootstrap method to
censored data would incorrectly result in different uncertainty intervals
if compared to non-parametric bootstrappingwhichcould lead to a fail-
dangerous underestimation a number of test simulations have
con1047297rmed this (results not shown) The parametric bootstrap however
could be applied by generating bootstrap samples from a parametric
distribution and censoring them manually in the speci1047297c case that the
complete data set has to be compared to one LOQ only (Zhao and Frey
2004)
Theresulting distributionwith its95 con1047297dence interval is shown
in Fig 7b
5 Discussion for real food product data
The examples presented in this article illustrate how complex data
sets including nondetects semiquantitative and qualitative measure-
ments can be interpreted in an appropriate way for use in microbio-logical risk assessment Ignoring nondetects or substituting them with
the LODLOQ or half of it is a classical source of bias (cf Lorimer and
Kiermeier 2007) that canand should be avoidedusingthesemethods It
has been demonstrated in this paper that even complex data sets
including either very diverse analyses or large amounts of censored
values can lead to very satisfying outcomes Nevertheless attention
must be paid to the possibilities and limitations of these methods
Blindly1047297tting a dataset with limited information (for example a data set
consisting of purely presenceabsence tests as obtained if analyses are
performed for compliance testing to a set legal criterion) to a speci1047297c
distribution might result in unrealistic outcomes Moreover the limited
small sample properties of the non-parametric bootstrap method must
be taken into account as is illustrated in the second case study This
supports the recommendation to set up dedicated baseline surveys fordata gathering to be used for risk assessment
The illustration of the various case studies (either with hypothet-
ical data sets and with real-life microbiological data sets from
dedicated or ad hoc combined surveys) shows that it is important in
establishment of microbiological baseline surveys to apply some
semi-quantitative methodology and by preference a methodology
enabling an estimation of numbers present in the positive laboratory
samples For example Straver et al (2007) estimated contamination
of chicken breast 1047297let with Salmonella using a combination of prior
enrichment of pooled laboratory samples and subsequent enumeration
of Salmonella in positive laboratory samples using a Most Probable
Number (MPN) assay The use of MPN methods or enumeration
methods (as in the Camplyobacter case study) with reduced limit of
quanti1047297cation which overall providerather an estimate of thenumber of
Fig 5 Illustration 2 (a) plot of the 95 con1047297dence interval of the 1047297tted log-normal
distribution for the set of semiquantitative measurements (b) scatter plot of the means
versus the standard deviations of the bootstrap samples and (c) comparison of the
parametersobtained by bootstrapping () andrandomly generatedparameters using a
linear relationship ()
266 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 810
pathogens present instead of an exact value is in the frame of the
estimation of distributions of the contamination level not a crucial
factor In the present Campylobacter case study it was shown that
inclusion of the measurement error interval for quantitative analyses
hardly affects the estimated distributions
The proportion ofnondetects onthe otherhandmay have a signi1047297cant
impact on theresult as has been shown in the present Campylobacter case
study This illustrates the positive effect that lowering the limit of
quanti1047297cation of a certain analysis method might have on lowering theuncertainty of thedistributionwhen themicrobiologist is confrontedwith
a substantial amount of nondetects It was noticed for the Listeria case
study thatthe uncertaintyof the distribution is especiallyincreased at the
lower levels (lt001 CFUg) (Fig 7b) as in this concentration range only
left-censored data are available More information on the estimated level
of contamination would enable to decrease uncertainty However overall
the estimated mean level of contamination for Listeria (micro =ndash158 log10CFUg) is much lower than for Campylobacter (micro =073 log10 CFUg)
which means that having access to enumeration data at these very low
levels of contamination will take considerable laboratory effort require
adapted methodological procedures and thus related costs for obtaining
this type of data set
On the other hand it was shown that to obtain a good data set for
estimation of distributions of contamination levels it does not
necessarily demand a large study In the present Campylobacter case
study it was illustrated that increasing the number of analyses to a
large extent might lead to only a limited additional reduction of
uncertainty in the case of an already suf 1047297cient data set with rep-
resentative outcomes The distribution of the Campylobacter contam-
ination level shown in Fig 6d is based upon 122 enumeration results
(obtained from in total 328 laboratory samples analyzed) whereas in
Fig 6a 269 enumeration values were available (obtained from 656
laboratory samples analyzed) for the estimation of the distribution of contamination level
Setting up a baseline survey to acquire a data set to serve as the
basis for estimation of an input distribution for risk assessment thus
has to take into account appropriate methodology to provide a suf-1047297cient number of detects and estimates of numbers but also has
to provide results which are representative for the objective of the
risk assessment eg food product under consideration stage in the
production chain variability between producers seasonality etc in
order not to introduce bias in the distribution obtained As such
setting up a baseline survey is a complex exercise Nevertheless if
the data set is available appropriate techniques also need to be used
to translate the information from the data set into a distribution
In the present study an approach based upon maximum likelihood
estimation wasshownto provide good resultsto presentthe variationof
Fig 6 Case study 1 (a) plot of the 95 con1047297dence interval of the normal distribution 1047297tted to the Campylobacter contamination data (b) in1047298uence of an increased LOD on the
resultingdistribution Thedotted lines representthe original data set thefull lines representthe data seton which an increasedLOD hasbeen imposed (c)in1047298uence of inclusionof a
measurement error interval on the estimated distribution (full lines) compared to the original data set (dotted lines) (d) in 1047298uence of the number of data points comparison of the
results of a random subset with N 2 data points (full lines) with the original data set (dotted lines)
267P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 910
contamination data Additional information with regard to variability
and uncertainty could be extracted as well using the bootstrap method
The same methodology can equally be applied with more complex
models such as mixture models or Poisson-like models Alternative
methods such as Bayesian analysis can also be applied and lead to
similar outcomes (results not shown) Examples of a Bayesian analysis
can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al
(2007)
Application of these techniques offers a way for meta-analysis of the
many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by conse-
quence the correctness of the outcome of the risk assessment
Acknowledgements
This research is supported in part by the Research Council of the
Katholieke Universiteit Leuven (projects OT0925TBA and EF05006
Center-of-Excellence Optimization in Engineering) knowledge plat-
form KP09005 (SCORES4CHEM) of the Industrial Research Fund the
Belgian Program on Interuniversity Poles of Attraction initiated by the
Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c
Research-Flanders (FWO-Vlaanderen project G042409 N) J Van
Impe holds the chair Safety Engineering sponsored by the Belgian
chemistry and life sciences federation essencia Research is conducted
utilizing high performance computational resources provided by the
University of Leuven httpluditkuleuvenbe
We would like to thank the Ghent University cluster of the
Department of Veterinary Public Health and Food Safety Faculty
of Veterinary Medicine and Department of Food Safety and Food
Quality Faculty of Bio-Science Engineering who kindly provided the
Campylobacter dataderived from a Federal Public Health Service funded
project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food
Safety and Food Quality Faculty of Bio-Science Engineering Ghent
University is acknowledged for providing the data on the microbiolog-
ical analysis and challenge testing for L monocytogenes
References
Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287
Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616
Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall
Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258
Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg
Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38
FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml
FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml
Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127
Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a
relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277
Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74
Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA
Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439
Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334
ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method
ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method
Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73
Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354
Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147
Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318
Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009
Evaluation of the ldquotesting and schedulingrdquo
strategy for control of Campylobacter in
Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the
bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution
1047297tted to the logarithmic Listeria monocytogenes contamination data
268 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 410
by plating on ALOA (Fiers Belgium) according to ISO (1998Amd 1
2004) standard method using a reduced limit of quanti1047297cation (thus
LOQ 10 CFUg) In case of execution of quantitative tests most of the
outcomes are below this LOQ which again results in semiquantitative
data For a number of samples other sample weights were tested
according to the demands of the food business operators hence
resulting in a rather complex data set
3 Results and discussion of the in silico generated data
31 Illustration 1 left-censored quantitative data
A data set is in silico generated and left-censored representing
quantitative measurements in log10 CFUg A total of 100 data points
are pseudorandomly drawn from a normal distribution with mean μ
equal to 0 and standard deviationσ equal to 2 (see Fig 2) The(lower)
LOQ is set to -0507 log10 CFUg which is the 40th percentile of a
cumulative normal distribution function with μ =0 and σ =2 After
censoring 40 out of 100 data points are regarded as below the LOQ
Using MLE a normal distribution is 1047297tted to these censored data
with 1047297tted mean μ = -005 log10 CFUg and 1047297tted standard deviation
σ = 188 log10 CFUg The sample mean of the data set when all
nondetects would have been substituted with the LOQ is 049 log10
CFUg or 029 log10 CFUg when nondetects would have been
substituted by half of the (logarithmic) LOQ (see Table 2) For
comparison the mean of the original sampled data ndash before any
censoring algorithm is applied and hence the true sample mean ndash is
005 log10 CFUg This illustrates the major bias that originates when
alternative practices such as substition of nondetects by the LOQ are
applied to data sets As is illustrated by Lorimer and Kiermeier (2007)
this has been the case frequently in past research If nondetects would
have been ignored the estimated mean is 115 log10 CFUg This
approach is often intended to model only the positive subpopulation
along with an additional variable for prevalence however this
implies that the sensitivity of the detection method is ignored ndash eg
when a counting method has an LOQ of 10 CFUg ndash and that these
results cannot be transferredto other portion sizes than themeasured
one The true sample standard deviation is 177 log10 CFUg Thevaluesof the standard deviations are estimated to be lower in the case
of substitution methods because these methods push low data points
(nondetects) towards the center of the sample distribution
Subsequently non-parametric bootstrapping is applied to estimate
theuncertainty distributions for themean and standard deviation of the
estimated concentration The original censored data set is resampled
with replacement for B =1000 bootstrap iterations and each time a
normal distribution is 1047297tted to the bootstrap sample using MLE From
these 1047297tted distributions quantiles are calculated and stored and the
95 con1047297dence interval for each quantile is subsequently determined
from the variation within bootstraps The result is plotted in Fig 3a
Distributions are 1047297tted to the bootstrap statistics to estimate
hyperparameters The bootstrap means μ are described by a normal
distribution with mean μ μ =minus005 log10 CFUg and standard
deviation σ μ = 021 log10 CFUg The standard deviation of the datasample is represented by a gamma distribution with shape parameter
α σ =994 and scale parameter β σ =187middot10minus2 (see Fig 3c and d)
As opposed to the next illustration the bootstrap means and
standard deviations ndash belonging to a two-dimensional space ndash show no
obvious correlation (see Fig 3b) The Pearson correlation coef 1047297cient
between both equals -0319 and the Spearman rank order correlation
coef 1047297cient -0302
32 Illustration 2 semiquantitative data
The same original data set of Illustration 1 generated from a
normal distribution with mean μ equal to 0 and standard deviation σ
equal to 2 is transformed to represent semiquantitative measure-
ments in log10 CFUg The 1047297rst limit of detection LOD1 is set to 0507
log10 CFUg and the second limit of detection LOD2=168 log10 CFU
g In this data set 64 data points fall below the 1047297rst LOD and hence
would be noted as negative test results Eighteen data points fall
above the second LOD ie 18 laboratory samples would be positive
for both the 1047297rst and the second presenceabsence test All other data
points fall between the two limits of detection and hence represent
laboratory samples of which the1047297rst presenceabsence test is positive
and the second one negative This is visualized in Fig 4
Using MLE the censored data set was 1047297tted to a normal
distribution with mean μ =-025 log10 CFUg and standard deviation
σ =211 log10 CFUg For comparison the mean and standard
deviation of the original data set that is before the censoring
algorithm is applied to it are respectively 005 log10 CFUg and 177
log10 CFUg The 1047297tted distribution (after censoring) resembles the
original sample distribution (before censoring) remarkably wellespecially considering the fact that the information is reduced rather
drastically as opposed to purely quantitative measurements The fact
that random data were initially generated from a normal distribution
which corresponds to the distribution assumed for maximum
likelihood estimation naturally contributes to the good results
Bootstrap samples are subsequently generated to determine the
uncertainty of the parameters of the distribution The 95 con1047297dence
interval is plotted in Fig 5a As can be seen uncertainty increases for
the lower concentrations which can be explained by the fact that
approximately 60 of measurements is treated as undetected without
further information The meansof thebootstrap samples are1047297ttedtoa
normal distribution with mean μ μ = -030 log10 CFUg and standard
deviation σ μ = 042 log10 CFUg For comparison in the uncensored
case the standard error about the mean would be estimated to be177=
ffiffiffiffiffiffiffiffiffi
100p
= 0177 according the the central limit theorem This
illustrates the increase of uncertainty due to censoring The standard
deviation is 1047297tted to a gamma distribution with shape parameter
α σ =186 and scale parameter β σ =117middot10minus1
The mean and standard deviation belong to a two-dimensional
parameter space and are generally not to be considered as
independent If a scatterplot of the 1047297tted mean versus the 1047297tted
standard deviation of each bootstrap sample is examined it is clear
that a correlation has risen between both due to the large amount of
values censored below the 1047297rst LOD For this particular illustration
this situation can be explained as follows When many data points
below the lower limit of detection are selected in a bootstrap sample
the 1047297tted bootstrap mean is lower and the bootstrap standard
deviation in contrary is estimated higher which induces a (negative)
Fig 2 Histogram of the in silico generated sample data points of the 1047297rst illustration
with the vertical line indicating the lower LOD under beneath which all data points are
to be censored
263P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 510
correlation between both parameters This is depicted in Fig 5b The
Pearson correlation coef 1047297cient between them equals -0733 and
the Spearman rank order correlation coef 1047297cient -0677 In case the
original data set is available to the risk assessor the results of the
bootstrap method can be used directly as input values of a two-
dimensional MonteCarlo simulation However when only distributions
representing parameter uncertainty are communicated (for example in
the case of data sets too large to include in a report) it is better not to
draw random samples for the mean and standard deviation indepen-
dently one from another in a 2D Monte Carlo simulation for risk
Table 2
Overview for all illustrations and case studies of the results of the maximum likelihood estimation and of the results of mean and standard deviation calculated with substitution
methods (all units in log10 CFUg)
Data set True
distribution
Sample
parameters
MLE Substitution of nondetects
Fitted distribution Substitution by 1
2 LOD Sub st itu tion by LOD I gnor ing nondetect s
Illustration 1 μ =0 x = 005 micro =ndash005 micro = 029 micro = 049 micro =115
quantitative data σ =2 s =177 σ=188 σ=144 σ=127 σ=125
Illustration 2 μ =0 x = 005 micro =ndash025 - - -
semi-quantitative data σ =2 s =177 σ=211
Case study 1 unknown unknown micro = 073 micro = 110 micro = 128 micro =168
Campylobacter data σ=103 σ=064 σ=053 σ=065
Case study 1a unknown unknown micro = 046 micro = 116 micro = 206 micro =258
increased LOD σ=122 σ=051 σ=024 σ=050
Case study 1b unknown unknown micro = 072 - - -
measurement error σ=099
Case study 1c unknown unknown micro = 063 micro = 107 micro = 126 micro =169
reduced data set σ=107 σ=062 σ=051 σ=064
Case study 2 unknown unknown micro =ndash158 - - -
L monocytogenes data σ=154
Fig 3 Illustration 1 (a) plot of the 95 con1047297dence interval of the log-normal distribution 1047297tted to the set of left-censored quantitative measurements (b) scatterplot of the bootstrap
sample means versus standard deviations and kernel density plot of respectively the means (c) and standard deviations (d) of the bootstrapsamples (grey) with the 1047297tted normal and
gamma distributions plotted on top of it (black)
264 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 610
assessment because this would incorrectly in1047298uence the representation
of uncertainty intervalsfor thelowerpercentilesof the1047297nal distribution
A number of solutions exist to this issue One could for example use
copulas or apply the Iman-Conover method for correlated sampling
from two distributions (Haas et al 1999 Iman and Conover 1982)
Here as an alternative solution the standard deviation is modelled as a
linear function of the mean with addition of an error term similarly as
was done by Calistri and Giovannini (2008) Based on the scatterplot it
is found reasonable to assumethat themean and standard deviation are
related approximately linearly and the error term remains of the same
magnitude overthe range of the mean For other case studies nonlinear
regression could be applied as well
The standard deviation is formulated as a linear function of the
mean
σ = β0 + β1sdotμ +
eth2
THORNwith β 0 and β 1 being respectively the intercept and slope of the linear
relation and an error term In this case following equation is ob-
tained by performing linear regression followed by assessment of the
residual values
σ = 191minus0888sdotμ + Normethμ = 0σ = 0344THORN eth3THORN
For comparison the results obtained by bootstrapping and the
results obtained by this linear model are plotted on top of each other
see Fig 5c
4 Results for real food product data
41 Case study 1 Campylobacter in chicken meat preparations
In this 1047297rst case study the results of Campylobacter analyses in
chicken meat preparations are evaluated The data set comprises
quantitative analysis results with an LOQ of 10 CFUg In 387 of the
656 measurements (59) the result is left-censored due to the LOQ
Using MLE the logarithms of the censored data have been 1047297tted to a
normal distribution with mean μ =073 log10 CFUg and standard
deviation σ =103 log10 CFUg
For comparison if the nondetects would have been substituted by
half of the LOQ the 1047297tted distribution would have been a normal
distribution with mean μ =110 log10 CFUg and standard deviation
σ =064 log10 CFUg (see Table 2)
After bootstrapping the mean and standard deviation are repre-
sented by a normal distribution and a gamma distribution respectively
The means of the bootstrap samples are 1047297tted to a normal distribution
with hyperparameters mean μ μ =073 log10 CFUg and standard
deviation σ μ =006 log10 CFUg The standard deviation is 1047297tted by a
gamma distribution with shape parameter α σ =304 and scale param-
eter β σ =339middot10minus3
The resulting distribution with its 95 con1047297dence interval is
shown in Fig 6a
As can be seen uncertainty about the distribution parameters is
rather small compared to variability as could be expected due to thelarge data set and the many remaining quantitative non-censored
data points
The Campylobacter data set is also used to test the in1047298uence of a
number of factors Firstly to check the in1047298uence of the limit of
quanti1047297cation all data points of the data set are censored to an
increased LOQ of 100 CFUg (standard LOQ of the Campylobacter
enumeration method) instead of 10 CFUg (reduced limit of quanti-1047297cation obtained by plating one milliliter over three mCCDA plates)
In this new data set 589 out of 656 values ie 90 (as opposed to 59
in the original data set) are censored Using maximum likelihood the
new estimated mean and standard deviation are 046 and 122 log10
CFUg The resulting distribution after bootstrapping is shown in
Fig 6b As can be seen this increased LOQ has a high in1047298uence on
parameter estimates as well as on uncertainty Despite the speci1047297city
of this particular case study it illustrates (in an opposite way) the
important impact a reduction of the limit of quanti1047297cation of current
detection methods (and thus an increase of non-censored values)
(eg Gnanou Besse et al 2004) might have on the obtained results
when data sets include a signi1047297cant amount of nondetects
It is also tested what the effect would be if the measurement
error would be included at a realistic level corresponding to routine
laboratory measurements A measurement error of 05 log10 CFUg is
superimposed on all original quantitative measurements thus replacing
all quantitative data points xi with an interval [ ximinus05 xi+05] log10
CFUg The newly obtained estimations of mean and standard deviation
are respectively 072 and 099 log10 CFUg Implementing measurement
error appears to have very little impact on the obtained result for this
data set as can be seen in Fig 6c
To illustrate theimpact of the number of data points on the obtaineddistribution a distribution is 1047297tted to half the number of data points
328 data points are randomly sampled from the original data set of 656
data points and subjected to MLE and bootstrapping The estimated
mean andstandard deviation are respectively 063 and107 log10 CFUg
The resulting distribution is depicted in Fig 6d Although uncertainty
intervals do increase somewhat the deviation of the new distribution
compared to the originally obtained distribution (Fig 6d) remains
rather limited despite the fact that the number of data points has been
reduced drastically This indicates that the investment of labor and costs
in a large number of additional measurements might not always have
the expected impact on the resulting output distribution
The results of all of these test cases are summarized in Table 2
Similarly as in the previous illustrations the bias that arises when
substitution methods are used or if nondetects are ignored can be seen
42 Case study 2 Listeria monocytogenes in smoked 1047297sh samples
The second case study consists of 103 measurements of Listeria
monocytogenes in smoked1047297sh samples As opposed to the Campylobacter
case study this data set contains merely 1 quantitative measurement
(1 laboratory sample enumerated L monocytogenes gt 10 CFUg ie
15 CFUg) All other measurements are either interval- left- or right-
censored Moreover the data set contains several different LODs
depending on the demands of the food business operator the particular
food samples were supplied by
Using MLE the logarithmic values of the analysis results are1047297tted to
a normal distribution with mean μ =minus158 log10 CFUg and standard
deviation σ =154 log10 CFUg Based on the empirical distributions
Fig 4 Histogramof the pseudorandomdata points usedin Illustration2 withthe vertical
lines indicating the limits of detection of the 1047297rst and second presenceabsence test
265P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 710
of the bootstrap estimates of the distribution parameters the normal
distribution is chosen to 1047297t both the mean and the standard deviation
The mean is 1047297tted to a normal distribution with hyperparameters
mean μ μ =minus158 log10 CFUg and standard deviation σ u=020 log10CFUg The standard deviation is 1047297tted to a normal distribution with
hyperparameters mean μ σ =151 log10 CFUg and standard deviation
σ σ =028 log10 CFUg To avoid sampling of negative values for the
standard deviation this distribution is truncated at zero
As can be seen in the scatterplot of the bootstrap means versus
the bootstrap standard deviations (see Fig 7a) a small number of
bootstrap samples at the lower part of the 1047297gure deviate from the
majority of the samples This deviation is caused by the absence in therespective bootstrap samples of a number of rare intervals from the
original data set These intervals all have concentration values higher
than the general mean value and their inclusionleads to an increase of
the standard deviation This separation between a small cloud of
points with low standard deviation and a big cloud with the majority
of the points is hence purely a consequence of the complexity of
this particular data set but illustrates the limitations of the non-
parametric bootstrap method When a data set has relatively few
distinct values (in this particular case 10 distinct values are present)
differences can be great between bootstrap samples This should
always be checked for when applying non-parametric bootstrap This
problem does not occur when the parametric bootstrap method is
applied however applying the parametric bootstrap method to
censored data would incorrectly result in different uncertainty intervals
if compared to non-parametric bootstrappingwhichcould lead to a fail-
dangerous underestimation a number of test simulations have
con1047297rmed this (results not shown) The parametric bootstrap however
could be applied by generating bootstrap samples from a parametric
distribution and censoring them manually in the speci1047297c case that the
complete data set has to be compared to one LOQ only (Zhao and Frey
2004)
Theresulting distributionwith its95 con1047297dence interval is shown
in Fig 7b
5 Discussion for real food product data
The examples presented in this article illustrate how complex data
sets including nondetects semiquantitative and qualitative measure-
ments can be interpreted in an appropriate way for use in microbio-logical risk assessment Ignoring nondetects or substituting them with
the LODLOQ or half of it is a classical source of bias (cf Lorimer and
Kiermeier 2007) that canand should be avoidedusingthesemethods It
has been demonstrated in this paper that even complex data sets
including either very diverse analyses or large amounts of censored
values can lead to very satisfying outcomes Nevertheless attention
must be paid to the possibilities and limitations of these methods
Blindly1047297tting a dataset with limited information (for example a data set
consisting of purely presenceabsence tests as obtained if analyses are
performed for compliance testing to a set legal criterion) to a speci1047297c
distribution might result in unrealistic outcomes Moreover the limited
small sample properties of the non-parametric bootstrap method must
be taken into account as is illustrated in the second case study This
supports the recommendation to set up dedicated baseline surveys fordata gathering to be used for risk assessment
The illustration of the various case studies (either with hypothet-
ical data sets and with real-life microbiological data sets from
dedicated or ad hoc combined surveys) shows that it is important in
establishment of microbiological baseline surveys to apply some
semi-quantitative methodology and by preference a methodology
enabling an estimation of numbers present in the positive laboratory
samples For example Straver et al (2007) estimated contamination
of chicken breast 1047297let with Salmonella using a combination of prior
enrichment of pooled laboratory samples and subsequent enumeration
of Salmonella in positive laboratory samples using a Most Probable
Number (MPN) assay The use of MPN methods or enumeration
methods (as in the Camplyobacter case study) with reduced limit of
quanti1047297cation which overall providerather an estimate of thenumber of
Fig 5 Illustration 2 (a) plot of the 95 con1047297dence interval of the 1047297tted log-normal
distribution for the set of semiquantitative measurements (b) scatter plot of the means
versus the standard deviations of the bootstrap samples and (c) comparison of the
parametersobtained by bootstrapping () andrandomly generatedparameters using a
linear relationship ()
266 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 810
pathogens present instead of an exact value is in the frame of the
estimation of distributions of the contamination level not a crucial
factor In the present Campylobacter case study it was shown that
inclusion of the measurement error interval for quantitative analyses
hardly affects the estimated distributions
The proportion ofnondetects onthe otherhandmay have a signi1047297cant
impact on theresult as has been shown in the present Campylobacter case
study This illustrates the positive effect that lowering the limit of
quanti1047297cation of a certain analysis method might have on lowering theuncertainty of thedistributionwhen themicrobiologist is confrontedwith
a substantial amount of nondetects It was noticed for the Listeria case
study thatthe uncertaintyof the distribution is especiallyincreased at the
lower levels (lt001 CFUg) (Fig 7b) as in this concentration range only
left-censored data are available More information on the estimated level
of contamination would enable to decrease uncertainty However overall
the estimated mean level of contamination for Listeria (micro =ndash158 log10CFUg) is much lower than for Campylobacter (micro =073 log10 CFUg)
which means that having access to enumeration data at these very low
levels of contamination will take considerable laboratory effort require
adapted methodological procedures and thus related costs for obtaining
this type of data set
On the other hand it was shown that to obtain a good data set for
estimation of distributions of contamination levels it does not
necessarily demand a large study In the present Campylobacter case
study it was illustrated that increasing the number of analyses to a
large extent might lead to only a limited additional reduction of
uncertainty in the case of an already suf 1047297cient data set with rep-
resentative outcomes The distribution of the Campylobacter contam-
ination level shown in Fig 6d is based upon 122 enumeration results
(obtained from in total 328 laboratory samples analyzed) whereas in
Fig 6a 269 enumeration values were available (obtained from 656
laboratory samples analyzed) for the estimation of the distribution of contamination level
Setting up a baseline survey to acquire a data set to serve as the
basis for estimation of an input distribution for risk assessment thus
has to take into account appropriate methodology to provide a suf-1047297cient number of detects and estimates of numbers but also has
to provide results which are representative for the objective of the
risk assessment eg food product under consideration stage in the
production chain variability between producers seasonality etc in
order not to introduce bias in the distribution obtained As such
setting up a baseline survey is a complex exercise Nevertheless if
the data set is available appropriate techniques also need to be used
to translate the information from the data set into a distribution
In the present study an approach based upon maximum likelihood
estimation wasshownto provide good resultsto presentthe variationof
Fig 6 Case study 1 (a) plot of the 95 con1047297dence interval of the normal distribution 1047297tted to the Campylobacter contamination data (b) in1047298uence of an increased LOD on the
resultingdistribution Thedotted lines representthe original data set thefull lines representthe data seton which an increasedLOD hasbeen imposed (c)in1047298uence of inclusionof a
measurement error interval on the estimated distribution (full lines) compared to the original data set (dotted lines) (d) in 1047298uence of the number of data points comparison of the
results of a random subset with N 2 data points (full lines) with the original data set (dotted lines)
267P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 910
contamination data Additional information with regard to variability
and uncertainty could be extracted as well using the bootstrap method
The same methodology can equally be applied with more complex
models such as mixture models or Poisson-like models Alternative
methods such as Bayesian analysis can also be applied and lead to
similar outcomes (results not shown) Examples of a Bayesian analysis
can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al
(2007)
Application of these techniques offers a way for meta-analysis of the
many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by conse-
quence the correctness of the outcome of the risk assessment
Acknowledgements
This research is supported in part by the Research Council of the
Katholieke Universiteit Leuven (projects OT0925TBA and EF05006
Center-of-Excellence Optimization in Engineering) knowledge plat-
form KP09005 (SCORES4CHEM) of the Industrial Research Fund the
Belgian Program on Interuniversity Poles of Attraction initiated by the
Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c
Research-Flanders (FWO-Vlaanderen project G042409 N) J Van
Impe holds the chair Safety Engineering sponsored by the Belgian
chemistry and life sciences federation essencia Research is conducted
utilizing high performance computational resources provided by the
University of Leuven httpluditkuleuvenbe
We would like to thank the Ghent University cluster of the
Department of Veterinary Public Health and Food Safety Faculty
of Veterinary Medicine and Department of Food Safety and Food
Quality Faculty of Bio-Science Engineering who kindly provided the
Campylobacter dataderived from a Federal Public Health Service funded
project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food
Safety and Food Quality Faculty of Bio-Science Engineering Ghent
University is acknowledged for providing the data on the microbiolog-
ical analysis and challenge testing for L monocytogenes
References
Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287
Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616
Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall
Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258
Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg
Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38
FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml
FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml
Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127
Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a
relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277
Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74
Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA
Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439
Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334
ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method
ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method
Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73
Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354
Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147
Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318
Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009
Evaluation of the ldquotesting and schedulingrdquo
strategy for control of Campylobacter in
Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the
bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution
1047297tted to the logarithmic Listeria monocytogenes contamination data
268 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 510
correlation between both parameters This is depicted in Fig 5b The
Pearson correlation coef 1047297cient between them equals -0733 and
the Spearman rank order correlation coef 1047297cient -0677 In case the
original data set is available to the risk assessor the results of the
bootstrap method can be used directly as input values of a two-
dimensional MonteCarlo simulation However when only distributions
representing parameter uncertainty are communicated (for example in
the case of data sets too large to include in a report) it is better not to
draw random samples for the mean and standard deviation indepen-
dently one from another in a 2D Monte Carlo simulation for risk
Table 2
Overview for all illustrations and case studies of the results of the maximum likelihood estimation and of the results of mean and standard deviation calculated with substitution
methods (all units in log10 CFUg)
Data set True
distribution
Sample
parameters
MLE Substitution of nondetects
Fitted distribution Substitution by 1
2 LOD Sub st itu tion by LOD I gnor ing nondetect s
Illustration 1 μ =0 x = 005 micro =ndash005 micro = 029 micro = 049 micro =115
quantitative data σ =2 s =177 σ=188 σ=144 σ=127 σ=125
Illustration 2 μ =0 x = 005 micro =ndash025 - - -
semi-quantitative data σ =2 s =177 σ=211
Case study 1 unknown unknown micro = 073 micro = 110 micro = 128 micro =168
Campylobacter data σ=103 σ=064 σ=053 σ=065
Case study 1a unknown unknown micro = 046 micro = 116 micro = 206 micro =258
increased LOD σ=122 σ=051 σ=024 σ=050
Case study 1b unknown unknown micro = 072 - - -
measurement error σ=099
Case study 1c unknown unknown micro = 063 micro = 107 micro = 126 micro =169
reduced data set σ=107 σ=062 σ=051 σ=064
Case study 2 unknown unknown micro =ndash158 - - -
L monocytogenes data σ=154
Fig 3 Illustration 1 (a) plot of the 95 con1047297dence interval of the log-normal distribution 1047297tted to the set of left-censored quantitative measurements (b) scatterplot of the bootstrap
sample means versus standard deviations and kernel density plot of respectively the means (c) and standard deviations (d) of the bootstrapsamples (grey) with the 1047297tted normal and
gamma distributions plotted on top of it (black)
264 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 610
assessment because this would incorrectly in1047298uence the representation
of uncertainty intervalsfor thelowerpercentilesof the1047297nal distribution
A number of solutions exist to this issue One could for example use
copulas or apply the Iman-Conover method for correlated sampling
from two distributions (Haas et al 1999 Iman and Conover 1982)
Here as an alternative solution the standard deviation is modelled as a
linear function of the mean with addition of an error term similarly as
was done by Calistri and Giovannini (2008) Based on the scatterplot it
is found reasonable to assumethat themean and standard deviation are
related approximately linearly and the error term remains of the same
magnitude overthe range of the mean For other case studies nonlinear
regression could be applied as well
The standard deviation is formulated as a linear function of the
mean
σ = β0 + β1sdotμ +
eth2
THORNwith β 0 and β 1 being respectively the intercept and slope of the linear
relation and an error term In this case following equation is ob-
tained by performing linear regression followed by assessment of the
residual values
σ = 191minus0888sdotμ + Normethμ = 0σ = 0344THORN eth3THORN
For comparison the results obtained by bootstrapping and the
results obtained by this linear model are plotted on top of each other
see Fig 5c
4 Results for real food product data
41 Case study 1 Campylobacter in chicken meat preparations
In this 1047297rst case study the results of Campylobacter analyses in
chicken meat preparations are evaluated The data set comprises
quantitative analysis results with an LOQ of 10 CFUg In 387 of the
656 measurements (59) the result is left-censored due to the LOQ
Using MLE the logarithms of the censored data have been 1047297tted to a
normal distribution with mean μ =073 log10 CFUg and standard
deviation σ =103 log10 CFUg
For comparison if the nondetects would have been substituted by
half of the LOQ the 1047297tted distribution would have been a normal
distribution with mean μ =110 log10 CFUg and standard deviation
σ =064 log10 CFUg (see Table 2)
After bootstrapping the mean and standard deviation are repre-
sented by a normal distribution and a gamma distribution respectively
The means of the bootstrap samples are 1047297tted to a normal distribution
with hyperparameters mean μ μ =073 log10 CFUg and standard
deviation σ μ =006 log10 CFUg The standard deviation is 1047297tted by a
gamma distribution with shape parameter α σ =304 and scale param-
eter β σ =339middot10minus3
The resulting distribution with its 95 con1047297dence interval is
shown in Fig 6a
As can be seen uncertainty about the distribution parameters is
rather small compared to variability as could be expected due to thelarge data set and the many remaining quantitative non-censored
data points
The Campylobacter data set is also used to test the in1047298uence of a
number of factors Firstly to check the in1047298uence of the limit of
quanti1047297cation all data points of the data set are censored to an
increased LOQ of 100 CFUg (standard LOQ of the Campylobacter
enumeration method) instead of 10 CFUg (reduced limit of quanti-1047297cation obtained by plating one milliliter over three mCCDA plates)
In this new data set 589 out of 656 values ie 90 (as opposed to 59
in the original data set) are censored Using maximum likelihood the
new estimated mean and standard deviation are 046 and 122 log10
CFUg The resulting distribution after bootstrapping is shown in
Fig 6b As can be seen this increased LOQ has a high in1047298uence on
parameter estimates as well as on uncertainty Despite the speci1047297city
of this particular case study it illustrates (in an opposite way) the
important impact a reduction of the limit of quanti1047297cation of current
detection methods (and thus an increase of non-censored values)
(eg Gnanou Besse et al 2004) might have on the obtained results
when data sets include a signi1047297cant amount of nondetects
It is also tested what the effect would be if the measurement
error would be included at a realistic level corresponding to routine
laboratory measurements A measurement error of 05 log10 CFUg is
superimposed on all original quantitative measurements thus replacing
all quantitative data points xi with an interval [ ximinus05 xi+05] log10
CFUg The newly obtained estimations of mean and standard deviation
are respectively 072 and 099 log10 CFUg Implementing measurement
error appears to have very little impact on the obtained result for this
data set as can be seen in Fig 6c
To illustrate theimpact of the number of data points on the obtaineddistribution a distribution is 1047297tted to half the number of data points
328 data points are randomly sampled from the original data set of 656
data points and subjected to MLE and bootstrapping The estimated
mean andstandard deviation are respectively 063 and107 log10 CFUg
The resulting distribution is depicted in Fig 6d Although uncertainty
intervals do increase somewhat the deviation of the new distribution
compared to the originally obtained distribution (Fig 6d) remains
rather limited despite the fact that the number of data points has been
reduced drastically This indicates that the investment of labor and costs
in a large number of additional measurements might not always have
the expected impact on the resulting output distribution
The results of all of these test cases are summarized in Table 2
Similarly as in the previous illustrations the bias that arises when
substitution methods are used or if nondetects are ignored can be seen
42 Case study 2 Listeria monocytogenes in smoked 1047297sh samples
The second case study consists of 103 measurements of Listeria
monocytogenes in smoked1047297sh samples As opposed to the Campylobacter
case study this data set contains merely 1 quantitative measurement
(1 laboratory sample enumerated L monocytogenes gt 10 CFUg ie
15 CFUg) All other measurements are either interval- left- or right-
censored Moreover the data set contains several different LODs
depending on the demands of the food business operator the particular
food samples were supplied by
Using MLE the logarithmic values of the analysis results are1047297tted to
a normal distribution with mean μ =minus158 log10 CFUg and standard
deviation σ =154 log10 CFUg Based on the empirical distributions
Fig 4 Histogramof the pseudorandomdata points usedin Illustration2 withthe vertical
lines indicating the limits of detection of the 1047297rst and second presenceabsence test
265P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 710
of the bootstrap estimates of the distribution parameters the normal
distribution is chosen to 1047297t both the mean and the standard deviation
The mean is 1047297tted to a normal distribution with hyperparameters
mean μ μ =minus158 log10 CFUg and standard deviation σ u=020 log10CFUg The standard deviation is 1047297tted to a normal distribution with
hyperparameters mean μ σ =151 log10 CFUg and standard deviation
σ σ =028 log10 CFUg To avoid sampling of negative values for the
standard deviation this distribution is truncated at zero
As can be seen in the scatterplot of the bootstrap means versus
the bootstrap standard deviations (see Fig 7a) a small number of
bootstrap samples at the lower part of the 1047297gure deviate from the
majority of the samples This deviation is caused by the absence in therespective bootstrap samples of a number of rare intervals from the
original data set These intervals all have concentration values higher
than the general mean value and their inclusionleads to an increase of
the standard deviation This separation between a small cloud of
points with low standard deviation and a big cloud with the majority
of the points is hence purely a consequence of the complexity of
this particular data set but illustrates the limitations of the non-
parametric bootstrap method When a data set has relatively few
distinct values (in this particular case 10 distinct values are present)
differences can be great between bootstrap samples This should
always be checked for when applying non-parametric bootstrap This
problem does not occur when the parametric bootstrap method is
applied however applying the parametric bootstrap method to
censored data would incorrectly result in different uncertainty intervals
if compared to non-parametric bootstrappingwhichcould lead to a fail-
dangerous underestimation a number of test simulations have
con1047297rmed this (results not shown) The parametric bootstrap however
could be applied by generating bootstrap samples from a parametric
distribution and censoring them manually in the speci1047297c case that the
complete data set has to be compared to one LOQ only (Zhao and Frey
2004)
Theresulting distributionwith its95 con1047297dence interval is shown
in Fig 7b
5 Discussion for real food product data
The examples presented in this article illustrate how complex data
sets including nondetects semiquantitative and qualitative measure-
ments can be interpreted in an appropriate way for use in microbio-logical risk assessment Ignoring nondetects or substituting them with
the LODLOQ or half of it is a classical source of bias (cf Lorimer and
Kiermeier 2007) that canand should be avoidedusingthesemethods It
has been demonstrated in this paper that even complex data sets
including either very diverse analyses or large amounts of censored
values can lead to very satisfying outcomes Nevertheless attention
must be paid to the possibilities and limitations of these methods
Blindly1047297tting a dataset with limited information (for example a data set
consisting of purely presenceabsence tests as obtained if analyses are
performed for compliance testing to a set legal criterion) to a speci1047297c
distribution might result in unrealistic outcomes Moreover the limited
small sample properties of the non-parametric bootstrap method must
be taken into account as is illustrated in the second case study This
supports the recommendation to set up dedicated baseline surveys fordata gathering to be used for risk assessment
The illustration of the various case studies (either with hypothet-
ical data sets and with real-life microbiological data sets from
dedicated or ad hoc combined surveys) shows that it is important in
establishment of microbiological baseline surveys to apply some
semi-quantitative methodology and by preference a methodology
enabling an estimation of numbers present in the positive laboratory
samples For example Straver et al (2007) estimated contamination
of chicken breast 1047297let with Salmonella using a combination of prior
enrichment of pooled laboratory samples and subsequent enumeration
of Salmonella in positive laboratory samples using a Most Probable
Number (MPN) assay The use of MPN methods or enumeration
methods (as in the Camplyobacter case study) with reduced limit of
quanti1047297cation which overall providerather an estimate of thenumber of
Fig 5 Illustration 2 (a) plot of the 95 con1047297dence interval of the 1047297tted log-normal
distribution for the set of semiquantitative measurements (b) scatter plot of the means
versus the standard deviations of the bootstrap samples and (c) comparison of the
parametersobtained by bootstrapping () andrandomly generatedparameters using a
linear relationship ()
266 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 810
pathogens present instead of an exact value is in the frame of the
estimation of distributions of the contamination level not a crucial
factor In the present Campylobacter case study it was shown that
inclusion of the measurement error interval for quantitative analyses
hardly affects the estimated distributions
The proportion ofnondetects onthe otherhandmay have a signi1047297cant
impact on theresult as has been shown in the present Campylobacter case
study This illustrates the positive effect that lowering the limit of
quanti1047297cation of a certain analysis method might have on lowering theuncertainty of thedistributionwhen themicrobiologist is confrontedwith
a substantial amount of nondetects It was noticed for the Listeria case
study thatthe uncertaintyof the distribution is especiallyincreased at the
lower levels (lt001 CFUg) (Fig 7b) as in this concentration range only
left-censored data are available More information on the estimated level
of contamination would enable to decrease uncertainty However overall
the estimated mean level of contamination for Listeria (micro =ndash158 log10CFUg) is much lower than for Campylobacter (micro =073 log10 CFUg)
which means that having access to enumeration data at these very low
levels of contamination will take considerable laboratory effort require
adapted methodological procedures and thus related costs for obtaining
this type of data set
On the other hand it was shown that to obtain a good data set for
estimation of distributions of contamination levels it does not
necessarily demand a large study In the present Campylobacter case
study it was illustrated that increasing the number of analyses to a
large extent might lead to only a limited additional reduction of
uncertainty in the case of an already suf 1047297cient data set with rep-
resentative outcomes The distribution of the Campylobacter contam-
ination level shown in Fig 6d is based upon 122 enumeration results
(obtained from in total 328 laboratory samples analyzed) whereas in
Fig 6a 269 enumeration values were available (obtained from 656
laboratory samples analyzed) for the estimation of the distribution of contamination level
Setting up a baseline survey to acquire a data set to serve as the
basis for estimation of an input distribution for risk assessment thus
has to take into account appropriate methodology to provide a suf-1047297cient number of detects and estimates of numbers but also has
to provide results which are representative for the objective of the
risk assessment eg food product under consideration stage in the
production chain variability between producers seasonality etc in
order not to introduce bias in the distribution obtained As such
setting up a baseline survey is a complex exercise Nevertheless if
the data set is available appropriate techniques also need to be used
to translate the information from the data set into a distribution
In the present study an approach based upon maximum likelihood
estimation wasshownto provide good resultsto presentthe variationof
Fig 6 Case study 1 (a) plot of the 95 con1047297dence interval of the normal distribution 1047297tted to the Campylobacter contamination data (b) in1047298uence of an increased LOD on the
resultingdistribution Thedotted lines representthe original data set thefull lines representthe data seton which an increasedLOD hasbeen imposed (c)in1047298uence of inclusionof a
measurement error interval on the estimated distribution (full lines) compared to the original data set (dotted lines) (d) in 1047298uence of the number of data points comparison of the
results of a random subset with N 2 data points (full lines) with the original data set (dotted lines)
267P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 910
contamination data Additional information with regard to variability
and uncertainty could be extracted as well using the bootstrap method
The same methodology can equally be applied with more complex
models such as mixture models or Poisson-like models Alternative
methods such as Bayesian analysis can also be applied and lead to
similar outcomes (results not shown) Examples of a Bayesian analysis
can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al
(2007)
Application of these techniques offers a way for meta-analysis of the
many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by conse-
quence the correctness of the outcome of the risk assessment
Acknowledgements
This research is supported in part by the Research Council of the
Katholieke Universiteit Leuven (projects OT0925TBA and EF05006
Center-of-Excellence Optimization in Engineering) knowledge plat-
form KP09005 (SCORES4CHEM) of the Industrial Research Fund the
Belgian Program on Interuniversity Poles of Attraction initiated by the
Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c
Research-Flanders (FWO-Vlaanderen project G042409 N) J Van
Impe holds the chair Safety Engineering sponsored by the Belgian
chemistry and life sciences federation essencia Research is conducted
utilizing high performance computational resources provided by the
University of Leuven httpluditkuleuvenbe
We would like to thank the Ghent University cluster of the
Department of Veterinary Public Health and Food Safety Faculty
of Veterinary Medicine and Department of Food Safety and Food
Quality Faculty of Bio-Science Engineering who kindly provided the
Campylobacter dataderived from a Federal Public Health Service funded
project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food
Safety and Food Quality Faculty of Bio-Science Engineering Ghent
University is acknowledged for providing the data on the microbiolog-
ical analysis and challenge testing for L monocytogenes
References
Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287
Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616
Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall
Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258
Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg
Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38
FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml
FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml
Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127
Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a
relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277
Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74
Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA
Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439
Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334
ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method
ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method
Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73
Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354
Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147
Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318
Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009
Evaluation of the ldquotesting and schedulingrdquo
strategy for control of Campylobacter in
Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the
bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution
1047297tted to the logarithmic Listeria monocytogenes contamination data
268 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 610
assessment because this would incorrectly in1047298uence the representation
of uncertainty intervalsfor thelowerpercentilesof the1047297nal distribution
A number of solutions exist to this issue One could for example use
copulas or apply the Iman-Conover method for correlated sampling
from two distributions (Haas et al 1999 Iman and Conover 1982)
Here as an alternative solution the standard deviation is modelled as a
linear function of the mean with addition of an error term similarly as
was done by Calistri and Giovannini (2008) Based on the scatterplot it
is found reasonable to assumethat themean and standard deviation are
related approximately linearly and the error term remains of the same
magnitude overthe range of the mean For other case studies nonlinear
regression could be applied as well
The standard deviation is formulated as a linear function of the
mean
σ = β0 + β1sdotμ +
eth2
THORNwith β 0 and β 1 being respectively the intercept and slope of the linear
relation and an error term In this case following equation is ob-
tained by performing linear regression followed by assessment of the
residual values
σ = 191minus0888sdotμ + Normethμ = 0σ = 0344THORN eth3THORN
For comparison the results obtained by bootstrapping and the
results obtained by this linear model are plotted on top of each other
see Fig 5c
4 Results for real food product data
41 Case study 1 Campylobacter in chicken meat preparations
In this 1047297rst case study the results of Campylobacter analyses in
chicken meat preparations are evaluated The data set comprises
quantitative analysis results with an LOQ of 10 CFUg In 387 of the
656 measurements (59) the result is left-censored due to the LOQ
Using MLE the logarithms of the censored data have been 1047297tted to a
normal distribution with mean μ =073 log10 CFUg and standard
deviation σ =103 log10 CFUg
For comparison if the nondetects would have been substituted by
half of the LOQ the 1047297tted distribution would have been a normal
distribution with mean μ =110 log10 CFUg and standard deviation
σ =064 log10 CFUg (see Table 2)
After bootstrapping the mean and standard deviation are repre-
sented by a normal distribution and a gamma distribution respectively
The means of the bootstrap samples are 1047297tted to a normal distribution
with hyperparameters mean μ μ =073 log10 CFUg and standard
deviation σ μ =006 log10 CFUg The standard deviation is 1047297tted by a
gamma distribution with shape parameter α σ =304 and scale param-
eter β σ =339middot10minus3
The resulting distribution with its 95 con1047297dence interval is
shown in Fig 6a
As can be seen uncertainty about the distribution parameters is
rather small compared to variability as could be expected due to thelarge data set and the many remaining quantitative non-censored
data points
The Campylobacter data set is also used to test the in1047298uence of a
number of factors Firstly to check the in1047298uence of the limit of
quanti1047297cation all data points of the data set are censored to an
increased LOQ of 100 CFUg (standard LOQ of the Campylobacter
enumeration method) instead of 10 CFUg (reduced limit of quanti-1047297cation obtained by plating one milliliter over three mCCDA plates)
In this new data set 589 out of 656 values ie 90 (as opposed to 59
in the original data set) are censored Using maximum likelihood the
new estimated mean and standard deviation are 046 and 122 log10
CFUg The resulting distribution after bootstrapping is shown in
Fig 6b As can be seen this increased LOQ has a high in1047298uence on
parameter estimates as well as on uncertainty Despite the speci1047297city
of this particular case study it illustrates (in an opposite way) the
important impact a reduction of the limit of quanti1047297cation of current
detection methods (and thus an increase of non-censored values)
(eg Gnanou Besse et al 2004) might have on the obtained results
when data sets include a signi1047297cant amount of nondetects
It is also tested what the effect would be if the measurement
error would be included at a realistic level corresponding to routine
laboratory measurements A measurement error of 05 log10 CFUg is
superimposed on all original quantitative measurements thus replacing
all quantitative data points xi with an interval [ ximinus05 xi+05] log10
CFUg The newly obtained estimations of mean and standard deviation
are respectively 072 and 099 log10 CFUg Implementing measurement
error appears to have very little impact on the obtained result for this
data set as can be seen in Fig 6c
To illustrate theimpact of the number of data points on the obtaineddistribution a distribution is 1047297tted to half the number of data points
328 data points are randomly sampled from the original data set of 656
data points and subjected to MLE and bootstrapping The estimated
mean andstandard deviation are respectively 063 and107 log10 CFUg
The resulting distribution is depicted in Fig 6d Although uncertainty
intervals do increase somewhat the deviation of the new distribution
compared to the originally obtained distribution (Fig 6d) remains
rather limited despite the fact that the number of data points has been
reduced drastically This indicates that the investment of labor and costs
in a large number of additional measurements might not always have
the expected impact on the resulting output distribution
The results of all of these test cases are summarized in Table 2
Similarly as in the previous illustrations the bias that arises when
substitution methods are used or if nondetects are ignored can be seen
42 Case study 2 Listeria monocytogenes in smoked 1047297sh samples
The second case study consists of 103 measurements of Listeria
monocytogenes in smoked1047297sh samples As opposed to the Campylobacter
case study this data set contains merely 1 quantitative measurement
(1 laboratory sample enumerated L monocytogenes gt 10 CFUg ie
15 CFUg) All other measurements are either interval- left- or right-
censored Moreover the data set contains several different LODs
depending on the demands of the food business operator the particular
food samples were supplied by
Using MLE the logarithmic values of the analysis results are1047297tted to
a normal distribution with mean μ =minus158 log10 CFUg and standard
deviation σ =154 log10 CFUg Based on the empirical distributions
Fig 4 Histogramof the pseudorandomdata points usedin Illustration2 withthe vertical
lines indicating the limits of detection of the 1047297rst and second presenceabsence test
265P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 710
of the bootstrap estimates of the distribution parameters the normal
distribution is chosen to 1047297t both the mean and the standard deviation
The mean is 1047297tted to a normal distribution with hyperparameters
mean μ μ =minus158 log10 CFUg and standard deviation σ u=020 log10CFUg The standard deviation is 1047297tted to a normal distribution with
hyperparameters mean μ σ =151 log10 CFUg and standard deviation
σ σ =028 log10 CFUg To avoid sampling of negative values for the
standard deviation this distribution is truncated at zero
As can be seen in the scatterplot of the bootstrap means versus
the bootstrap standard deviations (see Fig 7a) a small number of
bootstrap samples at the lower part of the 1047297gure deviate from the
majority of the samples This deviation is caused by the absence in therespective bootstrap samples of a number of rare intervals from the
original data set These intervals all have concentration values higher
than the general mean value and their inclusionleads to an increase of
the standard deviation This separation between a small cloud of
points with low standard deviation and a big cloud with the majority
of the points is hence purely a consequence of the complexity of
this particular data set but illustrates the limitations of the non-
parametric bootstrap method When a data set has relatively few
distinct values (in this particular case 10 distinct values are present)
differences can be great between bootstrap samples This should
always be checked for when applying non-parametric bootstrap This
problem does not occur when the parametric bootstrap method is
applied however applying the parametric bootstrap method to
censored data would incorrectly result in different uncertainty intervals
if compared to non-parametric bootstrappingwhichcould lead to a fail-
dangerous underestimation a number of test simulations have
con1047297rmed this (results not shown) The parametric bootstrap however
could be applied by generating bootstrap samples from a parametric
distribution and censoring them manually in the speci1047297c case that the
complete data set has to be compared to one LOQ only (Zhao and Frey
2004)
Theresulting distributionwith its95 con1047297dence interval is shown
in Fig 7b
5 Discussion for real food product data
The examples presented in this article illustrate how complex data
sets including nondetects semiquantitative and qualitative measure-
ments can be interpreted in an appropriate way for use in microbio-logical risk assessment Ignoring nondetects or substituting them with
the LODLOQ or half of it is a classical source of bias (cf Lorimer and
Kiermeier 2007) that canand should be avoidedusingthesemethods It
has been demonstrated in this paper that even complex data sets
including either very diverse analyses or large amounts of censored
values can lead to very satisfying outcomes Nevertheless attention
must be paid to the possibilities and limitations of these methods
Blindly1047297tting a dataset with limited information (for example a data set
consisting of purely presenceabsence tests as obtained if analyses are
performed for compliance testing to a set legal criterion) to a speci1047297c
distribution might result in unrealistic outcomes Moreover the limited
small sample properties of the non-parametric bootstrap method must
be taken into account as is illustrated in the second case study This
supports the recommendation to set up dedicated baseline surveys fordata gathering to be used for risk assessment
The illustration of the various case studies (either with hypothet-
ical data sets and with real-life microbiological data sets from
dedicated or ad hoc combined surveys) shows that it is important in
establishment of microbiological baseline surveys to apply some
semi-quantitative methodology and by preference a methodology
enabling an estimation of numbers present in the positive laboratory
samples For example Straver et al (2007) estimated contamination
of chicken breast 1047297let with Salmonella using a combination of prior
enrichment of pooled laboratory samples and subsequent enumeration
of Salmonella in positive laboratory samples using a Most Probable
Number (MPN) assay The use of MPN methods or enumeration
methods (as in the Camplyobacter case study) with reduced limit of
quanti1047297cation which overall providerather an estimate of thenumber of
Fig 5 Illustration 2 (a) plot of the 95 con1047297dence interval of the 1047297tted log-normal
distribution for the set of semiquantitative measurements (b) scatter plot of the means
versus the standard deviations of the bootstrap samples and (c) comparison of the
parametersobtained by bootstrapping () andrandomly generatedparameters using a
linear relationship ()
266 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 810
pathogens present instead of an exact value is in the frame of the
estimation of distributions of the contamination level not a crucial
factor In the present Campylobacter case study it was shown that
inclusion of the measurement error interval for quantitative analyses
hardly affects the estimated distributions
The proportion ofnondetects onthe otherhandmay have a signi1047297cant
impact on theresult as has been shown in the present Campylobacter case
study This illustrates the positive effect that lowering the limit of
quanti1047297cation of a certain analysis method might have on lowering theuncertainty of thedistributionwhen themicrobiologist is confrontedwith
a substantial amount of nondetects It was noticed for the Listeria case
study thatthe uncertaintyof the distribution is especiallyincreased at the
lower levels (lt001 CFUg) (Fig 7b) as in this concentration range only
left-censored data are available More information on the estimated level
of contamination would enable to decrease uncertainty However overall
the estimated mean level of contamination for Listeria (micro =ndash158 log10CFUg) is much lower than for Campylobacter (micro =073 log10 CFUg)
which means that having access to enumeration data at these very low
levels of contamination will take considerable laboratory effort require
adapted methodological procedures and thus related costs for obtaining
this type of data set
On the other hand it was shown that to obtain a good data set for
estimation of distributions of contamination levels it does not
necessarily demand a large study In the present Campylobacter case
study it was illustrated that increasing the number of analyses to a
large extent might lead to only a limited additional reduction of
uncertainty in the case of an already suf 1047297cient data set with rep-
resentative outcomes The distribution of the Campylobacter contam-
ination level shown in Fig 6d is based upon 122 enumeration results
(obtained from in total 328 laboratory samples analyzed) whereas in
Fig 6a 269 enumeration values were available (obtained from 656
laboratory samples analyzed) for the estimation of the distribution of contamination level
Setting up a baseline survey to acquire a data set to serve as the
basis for estimation of an input distribution for risk assessment thus
has to take into account appropriate methodology to provide a suf-1047297cient number of detects and estimates of numbers but also has
to provide results which are representative for the objective of the
risk assessment eg food product under consideration stage in the
production chain variability between producers seasonality etc in
order not to introduce bias in the distribution obtained As such
setting up a baseline survey is a complex exercise Nevertheless if
the data set is available appropriate techniques also need to be used
to translate the information from the data set into a distribution
In the present study an approach based upon maximum likelihood
estimation wasshownto provide good resultsto presentthe variationof
Fig 6 Case study 1 (a) plot of the 95 con1047297dence interval of the normal distribution 1047297tted to the Campylobacter contamination data (b) in1047298uence of an increased LOD on the
resultingdistribution Thedotted lines representthe original data set thefull lines representthe data seton which an increasedLOD hasbeen imposed (c)in1047298uence of inclusionof a
measurement error interval on the estimated distribution (full lines) compared to the original data set (dotted lines) (d) in 1047298uence of the number of data points comparison of the
results of a random subset with N 2 data points (full lines) with the original data set (dotted lines)
267P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 910
contamination data Additional information with regard to variability
and uncertainty could be extracted as well using the bootstrap method
The same methodology can equally be applied with more complex
models such as mixture models or Poisson-like models Alternative
methods such as Bayesian analysis can also be applied and lead to
similar outcomes (results not shown) Examples of a Bayesian analysis
can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al
(2007)
Application of these techniques offers a way for meta-analysis of the
many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by conse-
quence the correctness of the outcome of the risk assessment
Acknowledgements
This research is supported in part by the Research Council of the
Katholieke Universiteit Leuven (projects OT0925TBA and EF05006
Center-of-Excellence Optimization in Engineering) knowledge plat-
form KP09005 (SCORES4CHEM) of the Industrial Research Fund the
Belgian Program on Interuniversity Poles of Attraction initiated by the
Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c
Research-Flanders (FWO-Vlaanderen project G042409 N) J Van
Impe holds the chair Safety Engineering sponsored by the Belgian
chemistry and life sciences federation essencia Research is conducted
utilizing high performance computational resources provided by the
University of Leuven httpluditkuleuvenbe
We would like to thank the Ghent University cluster of the
Department of Veterinary Public Health and Food Safety Faculty
of Veterinary Medicine and Department of Food Safety and Food
Quality Faculty of Bio-Science Engineering who kindly provided the
Campylobacter dataderived from a Federal Public Health Service funded
project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food
Safety and Food Quality Faculty of Bio-Science Engineering Ghent
University is acknowledged for providing the data on the microbiolog-
ical analysis and challenge testing for L monocytogenes
References
Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287
Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616
Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall
Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258
Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg
Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38
FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml
FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml
Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127
Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a
relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277
Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74
Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA
Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439
Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334
ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method
ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method
Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73
Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354
Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147
Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318
Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009
Evaluation of the ldquotesting and schedulingrdquo
strategy for control of Campylobacter in
Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the
bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution
1047297tted to the logarithmic Listeria monocytogenes contamination data
268 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 710
of the bootstrap estimates of the distribution parameters the normal
distribution is chosen to 1047297t both the mean and the standard deviation
The mean is 1047297tted to a normal distribution with hyperparameters
mean μ μ =minus158 log10 CFUg and standard deviation σ u=020 log10CFUg The standard deviation is 1047297tted to a normal distribution with
hyperparameters mean μ σ =151 log10 CFUg and standard deviation
σ σ =028 log10 CFUg To avoid sampling of negative values for the
standard deviation this distribution is truncated at zero
As can be seen in the scatterplot of the bootstrap means versus
the bootstrap standard deviations (see Fig 7a) a small number of
bootstrap samples at the lower part of the 1047297gure deviate from the
majority of the samples This deviation is caused by the absence in therespective bootstrap samples of a number of rare intervals from the
original data set These intervals all have concentration values higher
than the general mean value and their inclusionleads to an increase of
the standard deviation This separation between a small cloud of
points with low standard deviation and a big cloud with the majority
of the points is hence purely a consequence of the complexity of
this particular data set but illustrates the limitations of the non-
parametric bootstrap method When a data set has relatively few
distinct values (in this particular case 10 distinct values are present)
differences can be great between bootstrap samples This should
always be checked for when applying non-parametric bootstrap This
problem does not occur when the parametric bootstrap method is
applied however applying the parametric bootstrap method to
censored data would incorrectly result in different uncertainty intervals
if compared to non-parametric bootstrappingwhichcould lead to a fail-
dangerous underestimation a number of test simulations have
con1047297rmed this (results not shown) The parametric bootstrap however
could be applied by generating bootstrap samples from a parametric
distribution and censoring them manually in the speci1047297c case that the
complete data set has to be compared to one LOQ only (Zhao and Frey
2004)
Theresulting distributionwith its95 con1047297dence interval is shown
in Fig 7b
5 Discussion for real food product data
The examples presented in this article illustrate how complex data
sets including nondetects semiquantitative and qualitative measure-
ments can be interpreted in an appropriate way for use in microbio-logical risk assessment Ignoring nondetects or substituting them with
the LODLOQ or half of it is a classical source of bias (cf Lorimer and
Kiermeier 2007) that canand should be avoidedusingthesemethods It
has been demonstrated in this paper that even complex data sets
including either very diverse analyses or large amounts of censored
values can lead to very satisfying outcomes Nevertheless attention
must be paid to the possibilities and limitations of these methods
Blindly1047297tting a dataset with limited information (for example a data set
consisting of purely presenceabsence tests as obtained if analyses are
performed for compliance testing to a set legal criterion) to a speci1047297c
distribution might result in unrealistic outcomes Moreover the limited
small sample properties of the non-parametric bootstrap method must
be taken into account as is illustrated in the second case study This
supports the recommendation to set up dedicated baseline surveys fordata gathering to be used for risk assessment
The illustration of the various case studies (either with hypothet-
ical data sets and with real-life microbiological data sets from
dedicated or ad hoc combined surveys) shows that it is important in
establishment of microbiological baseline surveys to apply some
semi-quantitative methodology and by preference a methodology
enabling an estimation of numbers present in the positive laboratory
samples For example Straver et al (2007) estimated contamination
of chicken breast 1047297let with Salmonella using a combination of prior
enrichment of pooled laboratory samples and subsequent enumeration
of Salmonella in positive laboratory samples using a Most Probable
Number (MPN) assay The use of MPN methods or enumeration
methods (as in the Camplyobacter case study) with reduced limit of
quanti1047297cation which overall providerather an estimate of thenumber of
Fig 5 Illustration 2 (a) plot of the 95 con1047297dence interval of the 1047297tted log-normal
distribution for the set of semiquantitative measurements (b) scatter plot of the means
versus the standard deviations of the bootstrap samples and (c) comparison of the
parametersobtained by bootstrapping () andrandomly generatedparameters using a
linear relationship ()
266 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 810
pathogens present instead of an exact value is in the frame of the
estimation of distributions of the contamination level not a crucial
factor In the present Campylobacter case study it was shown that
inclusion of the measurement error interval for quantitative analyses
hardly affects the estimated distributions
The proportion ofnondetects onthe otherhandmay have a signi1047297cant
impact on theresult as has been shown in the present Campylobacter case
study This illustrates the positive effect that lowering the limit of
quanti1047297cation of a certain analysis method might have on lowering theuncertainty of thedistributionwhen themicrobiologist is confrontedwith
a substantial amount of nondetects It was noticed for the Listeria case
study thatthe uncertaintyof the distribution is especiallyincreased at the
lower levels (lt001 CFUg) (Fig 7b) as in this concentration range only
left-censored data are available More information on the estimated level
of contamination would enable to decrease uncertainty However overall
the estimated mean level of contamination for Listeria (micro =ndash158 log10CFUg) is much lower than for Campylobacter (micro =073 log10 CFUg)
which means that having access to enumeration data at these very low
levels of contamination will take considerable laboratory effort require
adapted methodological procedures and thus related costs for obtaining
this type of data set
On the other hand it was shown that to obtain a good data set for
estimation of distributions of contamination levels it does not
necessarily demand a large study In the present Campylobacter case
study it was illustrated that increasing the number of analyses to a
large extent might lead to only a limited additional reduction of
uncertainty in the case of an already suf 1047297cient data set with rep-
resentative outcomes The distribution of the Campylobacter contam-
ination level shown in Fig 6d is based upon 122 enumeration results
(obtained from in total 328 laboratory samples analyzed) whereas in
Fig 6a 269 enumeration values were available (obtained from 656
laboratory samples analyzed) for the estimation of the distribution of contamination level
Setting up a baseline survey to acquire a data set to serve as the
basis for estimation of an input distribution for risk assessment thus
has to take into account appropriate methodology to provide a suf-1047297cient number of detects and estimates of numbers but also has
to provide results which are representative for the objective of the
risk assessment eg food product under consideration stage in the
production chain variability between producers seasonality etc in
order not to introduce bias in the distribution obtained As such
setting up a baseline survey is a complex exercise Nevertheless if
the data set is available appropriate techniques also need to be used
to translate the information from the data set into a distribution
In the present study an approach based upon maximum likelihood
estimation wasshownto provide good resultsto presentthe variationof
Fig 6 Case study 1 (a) plot of the 95 con1047297dence interval of the normal distribution 1047297tted to the Campylobacter contamination data (b) in1047298uence of an increased LOD on the
resultingdistribution Thedotted lines representthe original data set thefull lines representthe data seton which an increasedLOD hasbeen imposed (c)in1047298uence of inclusionof a
measurement error interval on the estimated distribution (full lines) compared to the original data set (dotted lines) (d) in 1047298uence of the number of data points comparison of the
results of a random subset with N 2 data points (full lines) with the original data set (dotted lines)
267P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 910
contamination data Additional information with regard to variability
and uncertainty could be extracted as well using the bootstrap method
The same methodology can equally be applied with more complex
models such as mixture models or Poisson-like models Alternative
methods such as Bayesian analysis can also be applied and lead to
similar outcomes (results not shown) Examples of a Bayesian analysis
can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al
(2007)
Application of these techniques offers a way for meta-analysis of the
many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by conse-
quence the correctness of the outcome of the risk assessment
Acknowledgements
This research is supported in part by the Research Council of the
Katholieke Universiteit Leuven (projects OT0925TBA and EF05006
Center-of-Excellence Optimization in Engineering) knowledge plat-
form KP09005 (SCORES4CHEM) of the Industrial Research Fund the
Belgian Program on Interuniversity Poles of Attraction initiated by the
Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c
Research-Flanders (FWO-Vlaanderen project G042409 N) J Van
Impe holds the chair Safety Engineering sponsored by the Belgian
chemistry and life sciences federation essencia Research is conducted
utilizing high performance computational resources provided by the
University of Leuven httpluditkuleuvenbe
We would like to thank the Ghent University cluster of the
Department of Veterinary Public Health and Food Safety Faculty
of Veterinary Medicine and Department of Food Safety and Food
Quality Faculty of Bio-Science Engineering who kindly provided the
Campylobacter dataderived from a Federal Public Health Service funded
project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food
Safety and Food Quality Faculty of Bio-Science Engineering Ghent
University is acknowledged for providing the data on the microbiolog-
ical analysis and challenge testing for L monocytogenes
References
Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287
Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616
Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall
Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258
Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg
Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38
FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml
FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml
Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127
Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a
relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277
Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74
Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA
Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439
Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334
ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method
ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method
Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73
Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354
Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147
Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318
Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009
Evaluation of the ldquotesting and schedulingrdquo
strategy for control of Campylobacter in
Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the
bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution
1047297tted to the logarithmic Listeria monocytogenes contamination data
268 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 810
pathogens present instead of an exact value is in the frame of the
estimation of distributions of the contamination level not a crucial
factor In the present Campylobacter case study it was shown that
inclusion of the measurement error interval for quantitative analyses
hardly affects the estimated distributions
The proportion ofnondetects onthe otherhandmay have a signi1047297cant
impact on theresult as has been shown in the present Campylobacter case
study This illustrates the positive effect that lowering the limit of
quanti1047297cation of a certain analysis method might have on lowering theuncertainty of thedistributionwhen themicrobiologist is confrontedwith
a substantial amount of nondetects It was noticed for the Listeria case
study thatthe uncertaintyof the distribution is especiallyincreased at the
lower levels (lt001 CFUg) (Fig 7b) as in this concentration range only
left-censored data are available More information on the estimated level
of contamination would enable to decrease uncertainty However overall
the estimated mean level of contamination for Listeria (micro =ndash158 log10CFUg) is much lower than for Campylobacter (micro =073 log10 CFUg)
which means that having access to enumeration data at these very low
levels of contamination will take considerable laboratory effort require
adapted methodological procedures and thus related costs for obtaining
this type of data set
On the other hand it was shown that to obtain a good data set for
estimation of distributions of contamination levels it does not
necessarily demand a large study In the present Campylobacter case
study it was illustrated that increasing the number of analyses to a
large extent might lead to only a limited additional reduction of
uncertainty in the case of an already suf 1047297cient data set with rep-
resentative outcomes The distribution of the Campylobacter contam-
ination level shown in Fig 6d is based upon 122 enumeration results
(obtained from in total 328 laboratory samples analyzed) whereas in
Fig 6a 269 enumeration values were available (obtained from 656
laboratory samples analyzed) for the estimation of the distribution of contamination level
Setting up a baseline survey to acquire a data set to serve as the
basis for estimation of an input distribution for risk assessment thus
has to take into account appropriate methodology to provide a suf-1047297cient number of detects and estimates of numbers but also has
to provide results which are representative for the objective of the
risk assessment eg food product under consideration stage in the
production chain variability between producers seasonality etc in
order not to introduce bias in the distribution obtained As such
setting up a baseline survey is a complex exercise Nevertheless if
the data set is available appropriate techniques also need to be used
to translate the information from the data set into a distribution
In the present study an approach based upon maximum likelihood
estimation wasshownto provide good resultsto presentthe variationof
Fig 6 Case study 1 (a) plot of the 95 con1047297dence interval of the normal distribution 1047297tted to the Campylobacter contamination data (b) in1047298uence of an increased LOD on the
resultingdistribution Thedotted lines representthe original data set thefull lines representthe data seton which an increasedLOD hasbeen imposed (c)in1047298uence of inclusionof a
measurement error interval on the estimated distribution (full lines) compared to the original data set (dotted lines) (d) in 1047298uence of the number of data points comparison of the
results of a random subset with N 2 data points (full lines) with the original data set (dotted lines)
267P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 910
contamination data Additional information with regard to variability
and uncertainty could be extracted as well using the bootstrap method
The same methodology can equally be applied with more complex
models such as mixture models or Poisson-like models Alternative
methods such as Bayesian analysis can also be applied and lead to
similar outcomes (results not shown) Examples of a Bayesian analysis
can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al
(2007)
Application of these techniques offers a way for meta-analysis of the
many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by conse-
quence the correctness of the outcome of the risk assessment
Acknowledgements
This research is supported in part by the Research Council of the
Katholieke Universiteit Leuven (projects OT0925TBA and EF05006
Center-of-Excellence Optimization in Engineering) knowledge plat-
form KP09005 (SCORES4CHEM) of the Industrial Research Fund the
Belgian Program on Interuniversity Poles of Attraction initiated by the
Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c
Research-Flanders (FWO-Vlaanderen project G042409 N) J Van
Impe holds the chair Safety Engineering sponsored by the Belgian
chemistry and life sciences federation essencia Research is conducted
utilizing high performance computational resources provided by the
University of Leuven httpluditkuleuvenbe
We would like to thank the Ghent University cluster of the
Department of Veterinary Public Health and Food Safety Faculty
of Veterinary Medicine and Department of Food Safety and Food
Quality Faculty of Bio-Science Engineering who kindly provided the
Campylobacter dataderived from a Federal Public Health Service funded
project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food
Safety and Food Quality Faculty of Bio-Science Engineering Ghent
University is acknowledged for providing the data on the microbiolog-
ical analysis and challenge testing for L monocytogenes
References
Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287
Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616
Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall
Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258
Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg
Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38
FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml
FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml
Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127
Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a
relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277
Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74
Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA
Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439
Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334
ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method
ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method
Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73
Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354
Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147
Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318
Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009
Evaluation of the ldquotesting and schedulingrdquo
strategy for control of Campylobacter in
Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the
bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution
1047297tted to the logarithmic Listeria monocytogenes contamination data
268 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 910
contamination data Additional information with regard to variability
and uncertainty could be extracted as well using the bootstrap method
The same methodology can equally be applied with more complex
models such as mixture models or Poisson-like models Alternative
methods such as Bayesian analysis can also be applied and lead to
similar outcomes (results not shown) Examples of a Bayesian analysis
can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al
(2007)
Application of these techniques offers a way for meta-analysis of the
many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore
increases the information input of a risk assessment and by conse-
quence the correctness of the outcome of the risk assessment
Acknowledgements
This research is supported in part by the Research Council of the
Katholieke Universiteit Leuven (projects OT0925TBA and EF05006
Center-of-Excellence Optimization in Engineering) knowledge plat-
form KP09005 (SCORES4CHEM) of the Industrial Research Fund the
Belgian Program on Interuniversity Poles of Attraction initiated by the
Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c
Research-Flanders (FWO-Vlaanderen project G042409 N) J Van
Impe holds the chair Safety Engineering sponsored by the Belgian
chemistry and life sciences federation essencia Research is conducted
utilizing high performance computational resources provided by the
University of Leuven httpluditkuleuvenbe
We would like to thank the Ghent University cluster of the
Department of Veterinary Public Health and Food Safety Faculty
of Veterinary Medicine and Department of Food Safety and Food
Quality Faculty of Bio-Science Engineering who kindly provided the
Campylobacter dataderived from a Federal Public Health Service funded
project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food
Safety and Food Quality Faculty of Bio-Science Engineering Ghent
University is acknowledged for providing the data on the microbiolog-
ical analysis and challenge testing for L monocytogenes
References
Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287
Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616
Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall
Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258
Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg
Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38
FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml
FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml
Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127
Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a
relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277
Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489
Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74
Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA
Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439
Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334
ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method
ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method
Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73
Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354
Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147
Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318
Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009
Evaluation of the ldquotesting and schedulingrdquo
strategy for control of Campylobacter in
Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the
bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution
1047297tted to the logarithmic Listeria monocytogenes contamination data
268 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269
8142019 jijfoodmicro201001025pdf
httpslidepdfcomreaderfulljijfoodmicro201001025pdf 1010
broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222
Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247
Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152
Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700
R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303
Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192
Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375
Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055
Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge
tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104
Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034
269P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269