j.ijfoodmicro.2010.01.025.pdf

8142019 jijfoodmicro201001025pdf

httpslidepdfcomreaderfulljijfoodmicro201001025pdf 110

Estimating distributions out of qualitative and (semi)quantitative microbiologicalcontamination data for use in risk assessment

P Busschaert ab AH Geeraerd ac M Uyttendaele d JF Van Impe ab

a CPMF2 Flemish Cluster Predictive Microbiology in Foodsb Division of Chemical and Biochemical Process Technology and Control Department of Chemical Engineering Katholieke Universiteit Leuven W de Croylaan 46 B-3001 Leuven Belgiumc Division of Mechatronics Biostatistics and Sensors (MeBioS) Department of Biosystems (BIOSYST) Katholieke Universiteit Leuven W de Croylaan 42 B-3001 Leuven Belgiumd Laboratory of Food Microbiology and Food Preservation Department of Food Safety and Food Quality Ghent University Coupure Links 653 B-9000 Ghent Belgium

a b s t r a c ta r t i c l e i n f o

Article history

Received 10 July 2009

Received in revised form 7 January 2010

Accepted 17 January 2010

Keywords

Quantitative microbiological risk assessment

Second order Monte Carlo simulation

Survival analysis

Maximum likelihood estimation (MLE)

Bootstrapping method

A framework using maximum likelihood estimation (MLE) is used to 1047297t a probability distribution to a set of

qualitative (eg absence in 25 g) semi-quantitative (eg presence in 25 g and absence in 1 g) andor

quantitative test results (eg 10 CFUg) Uncertainty about the parameters of the variability distribution is

characterized through a non-parametric bootstrapping method The resulting distribution function can be

used as an input for a second order Monte Carlo simulation in quantitative risk assessment As an illustration

the method is applied to two sets of in silico generated data It is demonstrated that correct interpretation of

data results in an accurate representation of the contamination level distribution Subsequently two case

studies are analyzed namely (i) quantitative analyses of Campylobacter spp in food samples with

nondetects and (ii) combined quantitative qualitative semiquantitative analyses and nondetects of Listeria

monocytogenes in smoked 1047297sh samples The 1047297rst of these case studies is also used to illustrate what the

in1047298uence is of the limit of quanti1047297cation measurement error and the number of samples included in the

data set Application of these techniques offers a way for meta-analysis of the many relevant yet diverse data

sets that are available in literature and (inter)national reports of surveillance or baseline surveys therefore

increases the information input of a risk assessment and by consequence the correctness of the outcome of

the risk assessment copy 2010 Elsevier BV All rights reserved

1 Introduction

As was stated by Oscar (2004) predictions of quantitative risk

assessment are only as good as the data used to develop and de1047297ne

them Microbiological contamination levels often are strongly

associated with the predicted risk (eg Peacuterez-Rodriacuteguez et al

2007) However since the collection of data generally is expensive

and labor intensive only a limited amount of data is available in most

cases Moreover the character of this kind of data is frequently

affected by other dif 1047297culties such as the low number of bacteria

present in a food sample compared to the limit of detection (LOD) or

limit of quanti1047297cation (LOQ) of the method of analysis Due to the

additional laboratory effort necessary to enumerate samples with

generally low numbers of bacteria present the typically low

prevalence of positive samples and regulatory requirements that do

not demand enumerative data laboratories often limit their efforts to

qualitative or semiquantitative analyses Because microbiological data

can consist of either qualitative (eg absence in 25 g) quantitative

(eg 10 CFUg) semiquantitative results (eg presence in 25 g and

absence in 1 g) or any combinations of these they can be hard to

summarize especially in the case where multiple combinationsof test

portion weights are used among different laboratory samples The

latter is often the case in risk assessment which uses as an input a

compilation of surveillance and research data from various sources

spread over multiple years (eg FAOWHO 2004 FDAUSDACDC

2003) In FDAUSDACDC (2003) for example positive presence

absence test were substituted by the limit of detection and the

frequency of positive presenceabsence tests together with this substi-

tution value was usedas a quantileof thecontaminationdistribution The

variance was determined on the other hand by enumeration studies so

that ndash based on a quantilefrom presenceabsence tests substituted by the

LOD and a variance based on enumeration studies ndash a concentration

distribution could be estimated

A maximum likelihood approach can be applied to deal with these

kinds of complex data sets Helsel (2005 2006) previously imple-

mented the maximum likelihood method to deal with measurement

values below a certain LOQ in environmental chemistry analyses

International Journal of Food Microbiology 138 (2010) 260ndash269

Corresponding author Division of Chemical and Biochemical Process Technology

and Control Department of Chemical Engineering Katholieke Universiteit W de

Croylaan 46 B-3001 Leuven Belgium

E-mail address janvanimpecitkuleuvenbe (JF Van Impe)

URL httpwwwcpmf2be

0168-1605$ ndash see front matter copy 2010 Elsevier BV All rights reserved

doi101016jijfoodmicro201001025

Contents lists available at ScienceDirect

International Journal of Food Microbiology

j o u r n a l h o m e p a g e w w w e l s ev i e r c o m l o c a t e i j f o o d m i c r o



Also Shorten et al (2006) and Lorimer and Kiermeier (2007)

suggested this method to deal with nondetects in microbiological

test results as opposed to biased approaches such as substitution of

nondetects by arbitrary values eg half of the LOQ Application of this

method can be found in eg Jordan (2005) and Pouillot et al (2007)

where laboratory samples were counted that were previously shown

to be positive using qualitative measurement method with higher

sensitivity

While these authors focussed primarily on applying maximumlikelihood estimation (MLE) to deal with quantitative data that are

censored at one side due to an LOQ the same techniques could be

generalized to combine quantitative semi-quantitative and qualita-

tive test results The 1047297rst objective of this research is therefore to

illustrate how MLE can be applied to represent complex data sets of

censored microbiological data with parametric distributions

Moreover although MLE allows to represent the variation of con-

tamination data by means of a parametric distribution additional

information with regards to variability and uncertainty could be

extracted from the available data as well This is the second objective

of this research Because uncertainty can be reduced by collection of

additional information while variability is inherent to any biological

system it is usefulto know what proportion of the variationin general is

caused by variability and what proportion is caused by uncertainty In

this case uncertainty is represented by additional statistical distribu-

tions ndash de1047297ned by hyperparameters ndash that describe the parameters of

the variability distribution of contamination Once these distributions

have been constructed the separation between variability and

uncertainty can be propagated in the course of the risk assessment by

the use of a second order Monte Carlo simulation

In this research the application of these techniques to microbiolog-

ical contamination data is explored Also the bias that originates when

alternative procedures are applied is examined The methods are

illustrated with in silico ndash ie computer simulated ndash data in order to

investigate the performance of these techniques Subsequently two

case studies based on laboratory measurements are explored namely a

data set of 656 quantitative measurements of Campylobacter in chicken

meat preparations (of which 59 are below the limit of quanti1047297cation)

and a data set of103 measurementsof Listeria monocytogenesin smoked1047297sh products consisting of quantitative semiquantitative and qualita-

tive measurements as well as nondetects

2 Material and methods

21 Maximum likelihood estimation and bootstrap

In case of a negative presenceabsence test the concentration of

the pathogentested for inthe food sampleis known to beless thanthe

limit of detection (LOD) of the analysis although no exact value is

known Also when an enumeration method is applied on a food

sampleand no colonies aredetected theconcentration is known to be

less than the limit of quanti1047297cation (LOQ) These values are said to be

left-censored Analogously a positive presenceabsence test results ina right-censored outcome

Maximum likelihood estimation is used to 1047297t a distribution to a set

of censored data (Cox and Oakes 1984 Helsel 2005) A parametric

candidate distribution is assumed to represent the observed data and

the MLE method estimates values for the parameters that are most

likely to have generated the observed measurements Parametric

distributions are chosen because they are assumed to correctly

represent the data sets considered here however in the case of more

complex data sets the risk assessor might also consider alternative

more complex models Zero-in1047298ated Poisson models or other variants

might be relevant in the case of count data with many nondetects

eg (Ridout et al 1998 Gonzales-Barron et al 2010) In case of more

dispersed data mixture models could be 1047297tted as well see for

example Creacutepet et al (2007) However it should be taken into

consideration that a mixture model should not be used to model the

heterogeneity of the speci1047297c data set (for example when food samples

were taken from 2 food business operators with each a different

contamination pro1047297le) when it is intended to be representative for a

more general situation (for example representing food samples of all

food business operators) Other studies have implemented hurdle

models with separate distributions for prevalence (presenceabsence)

and concentration (CFUg) of pathogens in food samples (see eg

Pouillot et al 2007 Peacuterez-Rodriacuteguez et al 2007) In this researchconcentration (CFUg) is modeled by a parametric distribution as

a hypothetical property of food samples for example a Poisson dis-

tribution with the concentration times serving size as a rate para-

meter could be implemented afterwards to model the contamination

in a speci1047297c serving as a non-negative integer

In the case of quantitative results the likelihood of obtaining these

results given a set of parameters θ is obtained by multiplication of all

values obtained by the probability density function p (middot) corresponding

to each data point xi given those parameters

leth X = f x1⋯ xngjθTHORN = prodn

i =1

peth xi jθTHORN eth1THORN

In the case of censored data the likelihood is given by multiplica-tion of areas under the probability density function instead of single

values hence the cumulative distribution function is used (Cox and

Oakes 1984) as is depicted in Fig 1 For computational convenience

logarithmic values of the likelihood are considered

The normal distribution has been chosen for the case studies

because it is generally assumed in food microbiology that contami-

nation is approximately log-normally distributed (Creacutepet et al 2007

Kilsby and Pugh 1981 Legan et al 2001) Ridout et al (1998) also

proposed the Poisson distribution and a Poisson distribution with a

Gamma distributed rate parameter λ however it was shown that the

lognormal distribution 1047297tted equally well to the data

Because these parameters are based on inference from measure-

ments of food samples which represent only a subset of the whole

population uncertainty about these parameters due to the limiteddata set is considered by using the bootstrapping technique (Efron

1982) In the parametric bootstrap method a distribution is 1047297t to a

data set Based on this distribution B new samples (of the same

sample size as the original data set) are sampled from this single

distribution For each of the B new samples the parameter of interest

is estimated and the distribution of all B estimated parameters

represents uncertainty about the estimate The empirical bootstrap

method is based on a similar procedure but instead of drawing B

samples from one1047297tted distribution theoriginaldata set is resampled

with replacement B times and a distribution is 1047297tted to each of the B

new samples

To include censored data points the empirical bootstrap approach is

chosen(as wasdone forexample by Zhaoand Frey 2004)Foranumberof

B=1000 iterations a bootstrap sample (ie sampling with replacement)

Fig 1 Illustration of the likelihood of quantitative data (left) versus interval-censored

data (right)

261P Busschaert et al International Journal of Food Microbiology 138 (2010) 260 ndash 269



is taken from theoriginal censored data setof measurements Parameters

for that bootstrap sample are determined using MLE and quantiles of the

obtained distribution are stored in order to express a con1047297dence interval

later on

The joint probability distribution obtained by the bootstrap

samples is used to express uncertainty about the parameters due to

the limited number of samples Whenever the original data set is

available to the risk assessor it is possible for the risk assessor to

generate bootstrap samples so that obtained values can be useddirectly as input values of a two-dimensional Monte Carlo simulation

However this is not always the case for example in the case of large

data sets Therefore to enable communication of parameter uncer-

tainty a parametric distribution that 1047297ts best to the bootstrap

estimates is chosen for each parameter based on visual comparison

with the kernel density plots quantile-quantile plots and the χ 2 and

Anderson-Darling goodness-of-1047297t tests Distributions that could

generate implausible samples (such as negative values for a standard

deviation) should be avoided or truncated In order to examine the

correlation between parameters among bootstrap samples scatter

plots the Pearson correlation coef 1047297cient and Spearman rank order

correlation coef 1047297cient are evaluated

To be able to visually represent the combined variability and

uncertainty the 95 con1047297dence intervals of all individual quantiles of

the variability distribution are expressed Among different bootstrap

iterations thevalue of each quantile of the variability distribution will

vary For all quantiles of the variability distribution an uncertainty

distribution is obtained of which the 25 50 and 975 percentile (over

all bootstrap iterations) are calculated For all cumulative density

plots that will follow the reading key is as follows The median value

of all quantiles of the variability distribution is indicated with a black

line the interval between the 25 and 975 percentile is indicated with

a grey area A large grey area in horizontal direction indicates large

uncertainty about the value of the respective quantiles of the variability

distribution ie the values of the quantiles differ considerably among

different bootstrap iterations A small grey area indicates that the value

of each quantile does not alter much in between different bootstrap

iterations hence uncertainty is small The degree of variability is indi-

cated by the interval the median line spans from the lower quantiles tothe higher quantiles

Thecode forthesesimulationsis written in R (R Development Core

Team 2009) using functions from the survival package for MLE

procedures and can be obtained from the authors upon request

Although not used in the present study at the time of writing a

package1047297tdistrplus appeared which canbe used for thesame purpose

(Delignette-Muller et al 2008)

22 Data sets

For illustrative purposes a set of left-censored data is pseudo-

randomly generated in silico in order to simulate quantitative

measurements with a single LOQ below which no values can be

measured To begin with one random sample of 100 data points ispseudorandomly generated from a normal distribution with mean

μ =0 log10 CFUg and standard deviation σ =2 log10 CFUg An LOQ

is chosen at the 40th percentile of the normal distribution and the

data set is censored so that approximately 40 of the data set will fall

below that LOQ and hence will be regarded as nondetects For this

speci1047297c data set 40 out of 100 data points are regarded as below the

LOQ after censoring

Instead of determining the number of CFU of a pathogen per gram

of a food sample a portion of 25 g for example could be examined for

the mere presence of a micro-organism A negative test result

indicates a concentration of less than (a hypothetical) 004 cells per

gram of that food sample If the test result is positivea smaller portion

of that same laboratory sample stored at a temperature not allowing

for growth eg a portion of 1 g could be examined again for

presenceabsence If a homogeneous distribution of cells among test

portions is assumed semi-quantitative results are obtained this way

A positive 25 g portion and a negative 1 g portion would indicate for

example a bacterial concentration of between 004 and 1 CFUg This

outcome is said to be interval-censored

A similar situation is simulated in the second illustration The same

original ndash uncensoredndash data set of 100pseudorandom values generated

from a normal distribution with mean μ =0 log10 CFUg and standard

deviation σ =2 log10 CFUg is used for this second illustration Twodetection limits are chosenso that the1047297rst one (LOD1) issituated at the

60th percentile of the distribution and the second one (LOD2) at the

80th percentile Subsequently the data are transformed into purely

semiquantitative data ie each data point is reduced to either smaller

than the 1047297rst LOD between the 1047297rst and second LOD or greater than the

second LOD

As a 1047297rst case study based on real microbiological measurements

laboratory analyses of Campylobacter in chicken meat preparations at

the Belgian retail market are analyzed (Habib et al 2008a) The data

set consists of direct plating results using the ISO (2006) standard

method with a reduced limit of quanti1047297cation By plating one milliliter

of the primary 10-fold diluted suspension of the chicken meat sample

on three modi1047297ed cefoperazone charcoal deoxycholate agar (mCCDA

Oxoid Basingstoke England UK) spread plates of 90 mm diameter a

limit of quanti1047297cation of 10 CFUg is obtained instead of the usual LOQ

of 100 CFUg In 387 out of 656 measurements (59) the result is left-

censored due to the LOQ

This1047297rst casestudy isalsoused toexamine the in1047298uence of a number

of conditions In order to check the effect of using a reduced LOQ

all values below 100 CFUg (the standard LOQ of the Campylobacter

enumeration method) were assumed to be censored and hence

regarded as nondetects To determine if measurement error (assumed

to be plusmn05 log10 units) (Habib et al 2008b) would have a signi1047297cant

impact on the resulting distributions another simulation is run with all

quantitative data points xi replaced by the interval [ ximinus05 xi+05]

log10 CFUg in other words xi has not its quantitative value anymore but

is assumed to be interval censored By doing this it is stated that dueto

measurement error the real value of measurement xi is known only to

be within the interval [ ximinus05 xi+05] Finally to illustrate whattheimpact of the size of thedata set is the simulation is also conducted

with only half the number of data points For that purpose 328 data

points are pseudorandomly sampled with replacement from the

original data set

A set of 103 laboratory samples of smoked1047297sh on the Belgian retail

marketis used asa secondcasestudy(Table 1) Thelaboratory samples

were analyzed in the period 2005-2007 for a number of food business

operators (Uyttendaele et al 2009) In most cases a test portion of

25 g is analyzed qualitatively for the presence of L monocytogenes

according to the AFNOR validated VIDAS LMO method (Bio-129-07

02) In case of a positive test result either a smaller test portion of the

samelaboratory sample is analyzed(eg presenceabsencetesting per

001 g) or a test portion of the same laboratory sample is enumerated

Table 1

Overview of contamination data of L monocytogenes in smoked 1047297sh on the Belgian

retail market in the period 2005-2007

Number of samples Concentration (CFUg)

54 lt004

2 lt100

26 004minus10

1 15

8 004minus100

2 gt100

1 lt1

1 gt1

7 004minus1

1 1-100

262 P Busschaert et al International Journal of Food Microbiology 138 (2010) 260ndash 269



by plating on ALOA (Fiers Belgium) according to ISO (1998Amd 1

2004) standard method using a reduced limit of quanti1047297cation (thus

LOQ 10 CFUg) In case of execution of quantitative tests most of the

outcomes are below this LOQ which again results in semiquantitative

data For a number of samples other sample weights were tested

according to the demands of the food business operators hence

resulting in a rather complex data set

3 Results and discussion of the in silico generated data

31 Illustration 1 left-censored quantitative data

A data set is in silico generated and left-censored representing

quantitative measurements in log10 CFUg A total of 100 data points

are pseudorandomly drawn from a normal distribution with mean μ

equal to 0 and standard deviationσ equal to 2 (see Fig 2) The(lower)

LOQ is set to -0507 log10 CFUg which is the 40th percentile of a

cumulative normal distribution function with μ =0 and σ =2 After

censoring 40 out of 100 data points are regarded as below the LOQ

Using MLE a normal distribution is 1047297tted to these censored data

with 1047297tted mean μ = -005 log10 CFUg and 1047297tted standard deviation

σ = 188 log10 CFUg The sample mean of the data set when all

nondetects would have been substituted with the LOQ is 049 log10

CFUg or 029 log10 CFUg when nondetects would have been

substituted by half of the (logarithmic) LOQ (see Table 2) For

comparison the mean of the original sampled data ndash before any

censoring algorithm is applied and hence the true sample mean ndash is

005 log10 CFUg This illustrates the major bias that originates when

alternative practices such as substition of nondetects by the LOQ are

applied to data sets As is illustrated by Lorimer and Kiermeier (2007)

this has been the case frequently in past research If nondetects would

have been ignored the estimated mean is 115 log10 CFUg This

approach is often intended to model only the positive subpopulation

along with an additional variable for prevalence however this

implies that the sensitivity of the detection method is ignored ndash eg

when a counting method has an LOQ of 10 CFUg ndash and that these

results cannot be transferredto other portion sizes than themeasured

one The true sample standard deviation is 177 log10 CFUg Thevaluesof the standard deviations are estimated to be lower in the case

of substitution methods because these methods push low data points

(nondetects) towards the center of the sample distribution

Subsequently non-parametric bootstrapping is applied to estimate

theuncertainty distributions for themean and standard deviation of the

estimated concentration The original censored data set is resampled

with replacement for B =1000 bootstrap iterations and each time a

normal distribution is 1047297tted to the bootstrap sample using MLE From

these 1047297tted distributions quantiles are calculated and stored and the

95 con1047297dence interval for each quantile is subsequently determined

from the variation within bootstraps The result is plotted in Fig 3a

Distributions are 1047297tted to the bootstrap statistics to estimate

hyperparameters The bootstrap means μ are described by a normal

distribution with mean μ μ =minus005 log10 CFUg and standard

deviation σ μ = 021 log10 CFUg The standard deviation of the datasample is represented by a gamma distribution with shape parameter

α σ =994 and scale parameter β σ =187middot10minus2 (see Fig 3c and d)

As opposed to the next illustration the bootstrap means and

standard deviations ndash belonging to a two-dimensional space ndash show no

obvious correlation (see Fig 3b) The Pearson correlation coef 1047297cient

between both equals -0319 and the Spearman rank order correlation

coef 1047297cient -0302

32 Illustration 2 semiquantitative data

The same original data set of Illustration 1 generated from a

normal distribution with mean μ equal to 0 and standard deviation σ

equal to 2 is transformed to represent semiquantitative measure-

ments in log10 CFUg The 1047297rst limit of detection LOD1 is set to 0507

log10 CFUg and the second limit of detection LOD2=168 log10 CFU

g In this data set 64 data points fall below the 1047297rst LOD and hence

would be noted as negative test results Eighteen data points fall

above the second LOD ie 18 laboratory samples would be positive

for both the 1047297rst and the second presenceabsence test All other data

points fall between the two limits of detection and hence represent

laboratory samples of which the1047297rst presenceabsence test is positive

and the second one negative This is visualized in Fig 4

Using MLE the censored data set was 1047297tted to a normal

distribution with mean μ =-025 log10 CFUg and standard deviation

σ =211 log10 CFUg For comparison the mean and standard

deviation of the original data set that is before the censoring

algorithm is applied to it are respectively 005 log10 CFUg and 177

log10 CFUg The 1047297tted distribution (after censoring) resembles the

original sample distribution (before censoring) remarkably wellespecially considering the fact that the information is reduced rather

drastically as opposed to purely quantitative measurements The fact

that random data were initially generated from a normal distribution

which corresponds to the distribution assumed for maximum

likelihood estimation naturally contributes to the good results

Bootstrap samples are subsequently generated to determine the

uncertainty of the parameters of the distribution The 95 con1047297dence

interval is plotted in Fig 5a As can be seen uncertainty increases for

the lower concentrations which can be explained by the fact that

approximately 60 of measurements is treated as undetected without

further information The meansof thebootstrap samples are1047297ttedtoa

normal distribution with mean μ μ = -030 log10 CFUg and standard

deviation σ μ = 042 log10 CFUg For comparison in the uncensored

case the standard error about the mean would be estimated to be177=

ffiffiffiffiffiffiffiffiffi

100p

= 0177 according the the central limit theorem This

illustrates the increase of uncertainty due to censoring The standard

deviation is 1047297tted to a gamma distribution with shape parameter

α σ =186 and scale parameter β σ =117middot10minus1

The mean and standard deviation belong to a two-dimensional

parameter space and are generally not to be considered as

independent If a scatterplot of the 1047297tted mean versus the 1047297tted

standard deviation of each bootstrap sample is examined it is clear

that a correlation has risen between both due to the large amount of

values censored below the 1047297rst LOD For this particular illustration

this situation can be explained as follows When many data points

below the lower limit of detection are selected in a bootstrap sample

the 1047297tted bootstrap mean is lower and the bootstrap standard

deviation in contrary is estimated higher which induces a (negative)

Fig 2 Histogram of the in silico generated sample data points of the 1047297rst illustration

with the vertical line indicating the lower LOD under beneath which all data points are

to be censored




correlation between both parameters This is depicted in Fig 5b The

Pearson correlation coef 1047297cient between them equals -0733 and

the Spearman rank order correlation coef 1047297cient -0677 In case the

original data set is available to the risk assessor the results of the

bootstrap method can be used directly as input values of a two-

dimensional MonteCarlo simulation However when only distributions

representing parameter uncertainty are communicated (for example in

the case of data sets too large to include in a report) it is better not to

draw random samples for the mean and standard deviation indepen-

dently one from another in a 2D Monte Carlo simulation for risk

Table 2

Overview for all illustrations and case studies of the results of the maximum likelihood estimation and of the results of mean and standard deviation calculated with substitution

methods (all units in log10 CFUg)

Data set True

distribution

Sample

parameters

MLE Substitution of nondetects

Fitted distribution Substitution by 1

2 LOD Sub st itu tion by LOD I gnor ing nondetect s

Illustration 1 μ =0 x = 005 micro =ndash005 micro = 029 micro = 049 micro =115

quantitative data σ =2 s =177 σ=188 σ=144 σ=127 σ=125

Illustration 2 μ =0 x = 005 micro =ndash025 - - -

semi-quantitative data σ =2 s =177 σ=211

Case study 1 unknown unknown micro = 073 micro = 110 micro = 128 micro =168

Campylobacter data σ=103 σ=064 σ=053 σ=065

Case study 1a unknown unknown micro = 046 micro = 116 micro = 206 micro =258

increased LOD σ=122 σ=051 σ=024 σ=050

Case study 1b unknown unknown micro = 072 - - -

measurement error σ=099

Case study 1c unknown unknown micro = 063 micro = 107 micro = 126 micro =169

reduced data set σ=107 σ=062 σ=051 σ=064

Case study 2 unknown unknown micro =ndash158 - - -

L monocytogenes data σ=154

Fig 3 Illustration 1 (a) plot of the 95 con1047297dence interval of the log-normal distribution 1047297tted to the set of left-censored quantitative measurements (b) scatterplot of the bootstrap

sample means versus standard deviations and kernel density plot of respectively the means (c) and standard deviations (d) of the bootstrapsamples (grey) with the 1047297tted normal and

gamma distributions plotted on top of it (black)




assessment because this would incorrectly in1047298uence the representation

of uncertainty intervalsfor thelowerpercentilesof the1047297nal distribution

A number of solutions exist to this issue One could for example use

copulas or apply the Iman-Conover method for correlated sampling

from two distributions (Haas et al 1999 Iman and Conover 1982)

Here as an alternative solution the standard deviation is modelled as a

linear function of the mean with addition of an error term similarly as

was done by Calistri and Giovannini (2008) Based on the scatterplot it

is found reasonable to assumethat themean and standard deviation are

related approximately linearly and the error term remains of the same

magnitude overthe range of the mean For other case studies nonlinear

regression could be applied as well

The standard deviation is formulated as a linear function of the

mean

σ = β0 + β1sdotμ +

eth2

THORNwith β 0 and β 1 being respectively the intercept and slope of the linear

relation and an error term In this case following equation is ob-

tained by performing linear regression followed by assessment of the

residual values

σ = 191minus0888sdotμ + Normethμ = 0σ = 0344THORN eth3THORN

For comparison the results obtained by bootstrapping and the

results obtained by this linear model are plotted on top of each other

see Fig 5c

4 Results for real food product data

41 Case study 1 Campylobacter in chicken meat preparations

In this 1047297rst case study the results of Campylobacter analyses in

chicken meat preparations are evaluated The data set comprises

quantitative analysis results with an LOQ of 10 CFUg In 387 of the

656 measurements (59) the result is left-censored due to the LOQ

Using MLE the logarithms of the censored data have been 1047297tted to a

normal distribution with mean μ =073 log10 CFUg and standard

deviation σ =103 log10 CFUg

For comparison if the nondetects would have been substituted by

half of the LOQ the 1047297tted distribution would have been a normal

distribution with mean μ =110 log10 CFUg and standard deviation

σ =064 log10 CFUg (see Table 2)

After bootstrapping the mean and standard deviation are repre-

sented by a normal distribution and a gamma distribution respectively

The means of the bootstrap samples are 1047297tted to a normal distribution

with hyperparameters mean μ μ =073 log10 CFUg and standard

deviation σ μ =006 log10 CFUg The standard deviation is 1047297tted by a

gamma distribution with shape parameter α σ =304 and scale param-

eter β σ =339middot10minus3

The resulting distribution with its 95 con1047297dence interval is

shown in Fig 6a

As can be seen uncertainty about the distribution parameters is

rather small compared to variability as could be expected due to thelarge data set and the many remaining quantitative non-censored

data points

The Campylobacter data set is also used to test the in1047298uence of a

number of factors Firstly to check the in1047298uence of the limit of

quanti1047297cation all data points of the data set are censored to an

increased LOQ of 100 CFUg (standard LOQ of the Campylobacter

enumeration method) instead of 10 CFUg (reduced limit of quanti-1047297cation obtained by plating one milliliter over three mCCDA plates)

In this new data set 589 out of 656 values ie 90 (as opposed to 59

in the original data set) are censored Using maximum likelihood the

new estimated mean and standard deviation are 046 and 122 log10

CFUg The resulting distribution after bootstrapping is shown in

Fig 6b As can be seen this increased LOQ has a high in1047298uence on

parameter estimates as well as on uncertainty Despite the speci1047297city

of this particular case study it illustrates (in an opposite way) the

important impact a reduction of the limit of quanti1047297cation of current

detection methods (and thus an increase of non-censored values)

(eg Gnanou Besse et al 2004) might have on the obtained results

when data sets include a signi1047297cant amount of nondetects

It is also tested what the effect would be if the measurement

error would be included at a realistic level corresponding to routine

laboratory measurements A measurement error of 05 log10 CFUg is

superimposed on all original quantitative measurements thus replacing

all quantitative data points xi with an interval [ ximinus05 xi+05] log10

CFUg The newly obtained estimations of mean and standard deviation

are respectively 072 and 099 log10 CFUg Implementing measurement

error appears to have very little impact on the obtained result for this

data set as can be seen in Fig 6c

To illustrate theimpact of the number of data points on the obtaineddistribution a distribution is 1047297tted to half the number of data points

328 data points are randomly sampled from the original data set of 656

data points and subjected to MLE and bootstrapping The estimated

mean andstandard deviation are respectively 063 and107 log10 CFUg

The resulting distribution is depicted in Fig 6d Although uncertainty

intervals do increase somewhat the deviation of the new distribution

compared to the originally obtained distribution (Fig 6d) remains

rather limited despite the fact that the number of data points has been

reduced drastically This indicates that the investment of labor and costs

in a large number of additional measurements might not always have

the expected impact on the resulting output distribution

The results of all of these test cases are summarized in Table 2

Similarly as in the previous illustrations the bias that arises when

substitution methods are used or if nondetects are ignored can be seen

42 Case study 2 Listeria monocytogenes in smoked 1047297sh samples

The second case study consists of 103 measurements of Listeria

monocytogenes in smoked1047297sh samples As opposed to the Campylobacter

case study this data set contains merely 1 quantitative measurement

(1 laboratory sample enumerated L monocytogenes gt 10 CFUg ie

15 CFUg) All other measurements are either interval- left- or right-

censored Moreover the data set contains several different LODs

depending on the demands of the food business operator the particular

food samples were supplied by

Using MLE the logarithmic values of the analysis results are1047297tted to

a normal distribution with mean μ =minus158 log10 CFUg and standard

deviation σ =154 log10 CFUg Based on the empirical distributions

Fig 4 Histogramof the pseudorandomdata points usedin Illustration2 withthe vertical

lines indicating the limits of detection of the 1047297rst and second presenceabsence test




of the bootstrap estimates of the distribution parameters the normal

distribution is chosen to 1047297t both the mean and the standard deviation

The mean is 1047297tted to a normal distribution with hyperparameters

mean μ μ =minus158 log10 CFUg and standard deviation σ u=020 log10CFUg The standard deviation is 1047297tted to a normal distribution with

hyperparameters mean μ σ =151 log10 CFUg and standard deviation

σ σ =028 log10 CFUg To avoid sampling of negative values for the

standard deviation this distribution is truncated at zero

As can be seen in the scatterplot of the bootstrap means versus

the bootstrap standard deviations (see Fig 7a) a small number of

bootstrap samples at the lower part of the 1047297gure deviate from the

majority of the samples This deviation is caused by the absence in therespective bootstrap samples of a number of rare intervals from the

original data set These intervals all have concentration values higher

than the general mean value and their inclusionleads to an increase of

the standard deviation This separation between a small cloud of

points with low standard deviation and a big cloud with the majority

of the points is hence purely a consequence of the complexity of

this particular data set but illustrates the limitations of the non-

parametric bootstrap method When a data set has relatively few

distinct values (in this particular case 10 distinct values are present)

differences can be great between bootstrap samples This should

always be checked for when applying non-parametric bootstrap This

problem does not occur when the parametric bootstrap method is

applied however applying the parametric bootstrap method to

censored data would incorrectly result in different uncertainty intervals

if compared to non-parametric bootstrappingwhichcould lead to a fail-

dangerous underestimation a number of test simulations have

con1047297rmed this (results not shown) The parametric bootstrap however

could be applied by generating bootstrap samples from a parametric

distribution and censoring them manually in the speci1047297c case that the

complete data set has to be compared to one LOQ only (Zhao and Frey

2004)

Theresulting distributionwith its95 con1047297dence interval is shown

in Fig 7b

5 Discussion for real food product data

The examples presented in this article illustrate how complex data

sets including nondetects semiquantitative and qualitative measure-

ments can be interpreted in an appropriate way for use in microbio-logical risk assessment Ignoring nondetects or substituting them with

the LODLOQ or half of it is a classical source of bias (cf Lorimer and

Kiermeier 2007) that canand should be avoidedusingthesemethods It

has been demonstrated in this paper that even complex data sets

including either very diverse analyses or large amounts of censored

values can lead to very satisfying outcomes Nevertheless attention

must be paid to the possibilities and limitations of these methods

Blindly1047297tting a dataset with limited information (for example a data set

consisting of purely presenceabsence tests as obtained if analyses are

performed for compliance testing to a set legal criterion) to a speci1047297c

distribution might result in unrealistic outcomes Moreover the limited

small sample properties of the non-parametric bootstrap method must

be taken into account as is illustrated in the second case study This

supports the recommendation to set up dedicated baseline surveys fordata gathering to be used for risk assessment

The illustration of the various case studies (either with hypothet-

ical data sets and with real-life microbiological data sets from

dedicated or ad hoc combined surveys) shows that it is important in

establishment of microbiological baseline surveys to apply some

semi-quantitative methodology and by preference a methodology

enabling an estimation of numbers present in the positive laboratory

samples For example Straver et al (2007) estimated contamination

of chicken breast 1047297let with Salmonella using a combination of prior

enrichment of pooled laboratory samples and subsequent enumeration

of Salmonella in positive laboratory samples using a Most Probable

Number (MPN) assay The use of MPN methods or enumeration

methods (as in the Camplyobacter case study) with reduced limit of

quanti1047297cation which overall providerather an estimate of thenumber of

Fig 5 Illustration 2 (a) plot of the 95 con1047297dence interval of the 1047297tted log-normal

distribution for the set of semiquantitative measurements (b) scatter plot of the means

versus the standard deviations of the bootstrap samples and (c) comparison of the

parametersobtained by bootstrapping () andrandomly generatedparameters using a

linear relationship ()




pathogens present instead of an exact value is in the frame of the

estimation of distributions of the contamination level not a crucial

factor In the present Campylobacter case study it was shown that

inclusion of the measurement error interval for quantitative analyses

hardly affects the estimated distributions

The proportion ofnondetects onthe otherhandmay have a signi1047297cant

impact on theresult as has been shown in the present Campylobacter case

study This illustrates the positive effect that lowering the limit of

quanti1047297cation of a certain analysis method might have on lowering theuncertainty of thedistributionwhen themicrobiologist is confrontedwith

a substantial amount of nondetects It was noticed for the Listeria case

study thatthe uncertaintyof the distribution is especiallyincreased at the

lower levels (lt001 CFUg) (Fig 7b) as in this concentration range only

left-censored data are available More information on the estimated level

of contamination would enable to decrease uncertainty However overall

the estimated mean level of contamination for Listeria (micro =ndash158 log10CFUg) is much lower than for Campylobacter (micro =073 log10 CFUg)

which means that having access to enumeration data at these very low

levels of contamination will take considerable laboratory effort require

adapted methodological procedures and thus related costs for obtaining

this type of data set

On the other hand it was shown that to obtain a good data set for

estimation of distributions of contamination levels it does not

necessarily demand a large study In the present Campylobacter case

study it was illustrated that increasing the number of analyses to a

large extent might lead to only a limited additional reduction of

uncertainty in the case of an already suf 1047297cient data set with rep-

resentative outcomes The distribution of the Campylobacter contam-

ination level shown in Fig 6d is based upon 122 enumeration results

(obtained from in total 328 laboratory samples analyzed) whereas in

Fig 6a 269 enumeration values were available (obtained from 656

laboratory samples analyzed) for the estimation of the distribution of contamination level

Setting up a baseline survey to acquire a data set to serve as the

basis for estimation of an input distribution for risk assessment thus

has to take into account appropriate methodology to provide a suf-1047297cient number of detects and estimates of numbers but also has

to provide results which are representative for the objective of the

risk assessment eg food product under consideration stage in the

production chain variability between producers seasonality etc in

order not to introduce bias in the distribution obtained As such

setting up a baseline survey is a complex exercise Nevertheless if

the data set is available appropriate techniques also need to be used

to translate the information from the data set into a distribution

In the present study an approach based upon maximum likelihood

estimation wasshownto provide good resultsto presentthe variationof

Fig 6 Case study 1 (a) plot of the 95 con1047297dence interval of the normal distribution 1047297tted to the Campylobacter contamination data (b) in1047298uence of an increased LOD on the

resultingdistribution Thedotted lines representthe original data set thefull lines representthe data seton which an increasedLOD hasbeen imposed (c)in1047298uence of inclusionof a

measurement error interval on the estimated distribution (full lines) compared to the original data set (dotted lines) (d) in 1047298uence of the number of data points comparison of the

results of a random subset with N 2 data points (full lines) with the original data set (dotted lines)




contamination data Additional information with regard to variability

and uncertainty could be extracted as well using the bootstrap method

The same methodology can equally be applied with more complex

models such as mixture models or Poisson-like models Alternative

methods such as Bayesian analysis can also be applied and lead to

similar outcomes (results not shown) Examples of a Bayesian analysis

can be found in Nautaet al (2009) Clough et al (2005) and Creacutepet et al

(2007)

Application of these techniques offers a way for meta-analysis of the

many relevant yet diverse data sets that are available in literature and(inter)national reports of surveillance or baseline surveys therefore

increases the information input of a risk assessment and by conse-

quence the correctness of the outcome of the risk assessment

Acknowledgements

This research is supported in part by the Research Council of the

Katholieke Universiteit Leuven (projects OT0925TBA and EF05006

Center-of-Excellence Optimization in Engineering) knowledge plat-

form KP09005 (SCORES4CHEM) of the Industrial Research Fund the

Belgian Program on Interuniversity Poles of Attraction initiated by the

Belgian Federal Science Policy Of 1047297ce and the Fund for Scienti1047297c

Research-Flanders (FWO-Vlaanderen project G042409 N) J Van

Impe holds the chair Safety Engineering sponsored by the Belgian

chemistry and life sciences federation essencia Research is conducted

utilizing high performance computational resources provided by the

University of Leuven httpluditkuleuvenbe

We would like to thank the Ghent University cluster of the

Department of Veterinary Public Health and Food Safety Faculty

of Veterinary Medicine and Department of Food Safety and Food

Quality Faculty of Bio-Science Engineering who kindly provided the

Campylobacter dataderived from a Federal Public Health Service funded

project Thestaffof theaccredited laboratorysectionof theLaboratory of Food Microbiology and Food Preservation at the Department Food

Safety and Food Quality Faculty of Bio-Science Engineering Ghent

University is acknowledged for providing the data on the microbiolog-

ical analysis and challenge testing for L monocytogenes

References

Calistri P Giovannini A 2008 Quantitative risk assessment of human campylobac-teriosis related to the consumption of chicken meat in two Italian regionsInternational Journal of Food Microbiology 128 274ndash287

Clough HE Clancy D ONeill PD Robinson SE French NP 2005 Quantifyinguncertainty associated with microbial count data a Bayesian approach Biometrics61 610ndash616

Cox DR Oakes D 1984 Analysis of Survival Data Monographs on Statistics andApplied Probability Chapman and Hall

Creacutepet AAlbert IDervin CCarlin F2007Estimation of microbialcontamination of food fromprevalenceand concentration data applicationto Listeria monocytogenesin fresh vegetables Applied and Environmental Microbiology 73 (1) 250ndash258

Delignette-Muller M L Pouillot R Denis J-B 2008 1047297tdistrplus Help to 1047297t of aparametric distribution to non-censored or censored data R package version 01-0URL httpriskassessmentr-forger-projectorg

Efron B 1982 The jackknife the bootstrap and other resampling plans CBMS-NSFRegional Conference Series in Applied Mathematics vol 38

FAOWHO 2004 Risk assessment of Listeria monocytogenes in ready-to-eat foodsAccessed at June 5 2009 URL httpwwwwhointfoodsafetypublicationsmicromralisteriaenindexhtml

FDAUSDACDC 2003 Quantitative assessment of relative risk to public health fromfoodborne Listeria monocytogenes among selected categories of ready-to-eat foodsAccessed at June 5 2009 URL httpwwwfoodsafetygovdmslmr2-tochtml

Gnanou Besse N Audinet N Beaufort A Colin P Cornu M Lombard B 2004 Acontribution to the improvement of Listeria monocytogenes enumeration in cold-smoked salmon International Journal of Food Microbiology 91 (2) 119 ndash127

Gonzales-BarronU KerrM Sheridan JJButler F 2010 Countdata distributions andtheirzero-modi1047297ed equivalents as a frameworkfor modelling microbialdatawith a

relatively high occurence of zerocounts International Journal of Food Microbiology136 (3) 268ndash277

Haas CN Thayyar-Madabusi A Rose JB Gerba CP 1999 Development andvalidation of dose-response relationship for Listeria monocytogenes QuantitativeMicrobiology 1 89ndash102

Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008a Baseline datafrom a Belgium-wide survey of Campylobacter species contamination in chickenmeat preparations and considerations for a reliable monitoring program Appliedand Environmental Microbiology 74 (17) 5483ndash5489

Habib I Sampers I Uyttendaele M Berkvens D De Zutter L 2008b Performancecharacteristics and estimation of measurement uncertainty of three platingprocedures for Campylobacter enumeration in chicken meat Food Microbiology25 (1) 65ndash74

Helsel DR 2005 Nondetects and data analysis statistics for censored environmentaldata Wiley Interscience USA

Helsel DR 2006 Fabricating data how substituting values for nondetects can ruinresults and what can be done about it Chemosphere 65 2434 ndash2439

Iman RL Conover WJ 1982 A distribution-free approach to inducing rankcorrelation among input variables Communications in Statistical Simulations andComputation 11 (3) 311ndash334

ISO 1998Amd 12004 International Standards Organization 10290-2 Microbiology of food and animal feeding stuffs ndash horizontal method for the detection andenumeration of Listeria monocytogenes ndash part 2 Enumeration method

ISO 2006 International Standards Organization 10272-1 Microbiology of food andanimal feeding stuffs ndash horizontal method for detection and enumeration of Campylobacter spp ndash part 2 Enumeration method

Jordan D 2005 Simulating the sensitiv ity of pooled-sampl e herd tests for fecalSalmonella in cattle Preventive Veterinary Medicine 70 59ndash73

Kilsby DC Pugh ME 1981 The relevance of the distribution of micro-organismswithin batches of food to the control of microbiological hazards from foods Journalof Applied Bacteriology 51 345ndash354

Legan JD Vandeven MH Dahms S Cole MB 2001 Determining the concentrationof microorganisms controlled by attributes sampling plans Food Control 12 (3)137ndash147

Lorimer MF Kiermeier A 2007 Analysing microbiological data Tobit or not TobitInternational Journal of Food Microbiology 116 313ndash318

Nauta MJ van der Wal FJ Putirulan FF Post J van de Kassteele J Bolder NM 2009

Evaluation of the ldquotesting and schedulingrdquo

strategy for control of Campylobacter in

Fig 7 Case study 2 (a) scatter plot of the 1047297tted means versus standard deviations of the

bootstrap samples and (b) plot of the 95 con1047297dence interval of the normal distribution

1047297tted to the logarithmic Listeria monocytogenes contamination data




broiler meat in The Netherlands International Journal of Food Microbiology 134216ndash222

Oscar TP 2004 A quantitative risk assessment model for Salmonella and wholechickens International Journal of Food Microbiology 93 231ndash247

Peacuterez-Rodriacuteguez F van Asselt ED Garciacutea-Gimeno RM Zurera G Zwietering MH2007 Extracting additional risk managers information from a risk assessment of Listeria monocytogenes in deli meats Journal of Food Protection 70 (5) 1137ndash1152

Pouillot R Miconnet N Afchain A-L Delignette-Muller ML Beaufort A Rosso LDenis J-B Cornu M 2007 Quantitative risk assessment of Listeria monocytogenesin French cold-smoked salmon I quantitative exposure assessment Risk Analysis27 (3) 683ndash700

R Development Core Team 2009 R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

ReindersRDDe JongeR EversEG2003A statistical methodto determinewhethermicro-organisms are randomly distributed in a food matrix applied to coliformsand Escherichia coli O157 in minced beef Food Microbiology 20 297ndash303

Ridout M Demeacutetrio CGB Hinde J 1998 Models for count data with many zeroesProceedings of the XIXth International Biometric Conference pp 179ndash192

Shorten PR Pleasants AB Soboleva TK 2006 Estimation of microbial growth usingpopulationmeasurements subject to a detection limit International Journal of FoodMicrobiology 108 369ndash375

Straver JM Janssen AFW Linnemann AR van Boekel AJS Beumer RRZwietering MH 2007 Number of Salmonella on chicken breast 1047297let at retaillevel and its implications for public health risk Journal of Food Protection 70 (9)2045ndash2055

Uyttendaele M Busschaert P Valero A Geeraerd AH Vermeulen A Jacxsens LGoh KK De Loy A Van Impe JF Devlieghere F 2009 Prevalence and challenge

tests of Listeria monocytogenes in Belgian produced and retailed mayonnaise-baseddeli-salads cooked meat products and smoked 1047297sh between 2005 and 2007International Journal of Food Microbiology 133 94ndash104

Zhao Y Frey HC 2004 Quanti1047297cation of variability and uncertaintyfor censored datasets and application to air toxic emission factors Risk Analysis 24 (4) 1019 ndash1034




Also Shorten et al (2006) and Lorimer and Kiermeier (2007)

suggested this method to deal with nondetects in microbiological

test results as opposed to biased approaches such as substitution of

nondetects by arbitrary values eg half of the LOQ Application of this

method can be found in eg Jordan (2005) and Pouillot et al (2007)

where laboratory samples were counted that were previously shown

to be positive using qualitative measurement method with higher

sensitivity

While these authors focussed primarily on applying maximumlikelihood estimation (MLE) to deal with quantitative data that are

censored at one side due to an LOQ the same techniques could be

generalized to combine quantitative semi-quantitative and qualita-

tive test results The 1047297rst objective of this research is therefore to

illustrate how MLE can be applied to represent complex data sets of

censored microbiological data with parametric distributions

Moreover although MLE allows to represent the variation of con-

tamination data by means of a parametric distribution additional

information with regards to variability and uncertainty could be

extracted from the available data as well This is the second objective

of this research Because uncertainty can be reduced by collection of

additional information while variability is inherent to any biological

system it is usefulto know what proportion of the variationin general is

caused by variability and what proportion is caused by uncertainty In

this case uncertainty is represented by additional statistical distribu-

tions ndash de1047297ned by hyperparameters ndash that describe the parameters of

the variability distribution of contamination Once these distributions

have been constructed the separation between variability and

uncertainty can be propagated in the course of the risk assessment by

the use of a second order Monte Carlo simulation

In this research the application of these techniques to microbiolog-

ical contamination data is explored Also the bias that originates when

alternative procedures are applied is examined The methods are

illustrated with in silico ndash ie computer simulated ndash data in order to

investigate the performance of these techniques Subsequently two

case studies based on laboratory measurements are explored namely a

data set of 656 quantitative measurements of Campylobacter in chicken

meat preparations (of which 59 are below the limit of quanti1047297cation)

and a data set of103 measurementsof Listeria monocytogenesin smoked1047297sh products consisting of quantitative semiquantitative and qualita-

tive measurements as well as nondetects

2 Material and methods

21 Maximum likelihood estimation and bootstrap

In case of a negative presenceabsence test the concentration of

the pathogentested for inthe food sampleis known to beless thanthe

limit of detection (LOD) of the analysis although no exact value is

known Also when an enumeration method is applied on a food

sampleand no colonies aredetected theconcentration is known to be

less than the limit of quanti1047297cation (LOQ) These values are said to be

left-censored Analogously a positive presenceabsence test results ina right-censored outcome

Maximum likelihood estimation is used to 1047297t a distribution to a set

of censored data (Cox and Oakes 1984 Helsel 2005) A parametric

candidate distribution is assumed to represent the observed data and

the MLE method estimates values for the parameters that are most

likely to have generated the observed measurements Parametric

distributions are chosen because they are assumed to correctly

represent the data sets considered here however in the case of more

complex data sets the risk assessor might also consider alternative

more complex models Zero-in1047298ated Poisson models or other variants

might be relevant in the case of count data with many nondetects

eg (Ridout et al 1998 Gonzales-Barron et al 2010) In case of more

dispersed data mixture models could be 1047297tted as well see for

example Creacutepet et al (2007) However it should be taken into

consideration that a mixture model should not be used to model the

heterogeneity of the speci1047297c data set (for example when food samples

were taken from 2 food business operators with each a different

contamination pro1047297le) when it is intended to be representative for a

more general situation (for example representing food samples of all

food business operators) Other studies have implemented hurdle

models with separate distributions for prevalence (presenceabsence)

and concentration (CFUg) of pathogens in food samples (see eg

Pouillot et al 2007 Peacuterez-Rodriacuteguez et al 2007) In this researchconcentration (CFUg) is modeled by a parametric distribution as

a hypothetical property of food samples for example a Poisson dis-

tribution with the concentration times serving size as a rate para-

meter could be implemented afterwards to model the contamination

in a speci1047297c serving as a non-negative integer

In the case of quantitative results the likelihood of obtaining these

results given a set of parameters θ is obtained by multiplication of all

values obtained by the probability density function p (middot) corresponding

to each data point xi given those parameters

leth X = f x1⋯ xngjθTHORN = prodn

i =1

peth xi jθTHORN eth1THORN

In the case of censored data the likelihood is given by multiplica-tion of areas under the probability density function instead of single

values hence the cumulative distribution function is used (Cox and

Oakes 1984) as is depicted in Fig 1 For computational convenience

logarithmic values of the likelihood are considered

The normal distribution has been chosen for the case studies

because it is generally assumed in food microbiology that contami-

nation is approximately log-normally distributed (Creacutepet et al 2007

Kilsby and Pugh 1981 Legan et al 2001) Ridout et al (1998) also

proposed the Poisson distribution and a Poisson distribution with a

Gamma distributed rate parameter λ however it was shown that the

lognormal distribution 1047297tted equally well to the data

Because these parameters are based on inference from measure-

ments of food samples which represent only a subset of the whole

population uncertainty about these parameters due to the limiteddata set is considered by using the bootstrapping technique (Efron

1982) In the parametric bootstrap method a distribution is 1047297t to a

data set Based on this distribution B new samples (of the same

sample size as the original data set) are sampled from this single

distribution For each of the B new samples the parameter of interest

is estimated and the distribution of all B estimated parameters

represents uncertainty about the estimate The empirical bootstrap

method is based on a similar procedure but instead of drawing B

samples from one1047297tted distribution theoriginaldata set is resampled

with replacement B times and a distribution is 1047297tted to each of the B

new samples

To include censored data points the empirical bootstrap approach is

chosen(as wasdone forexample by Zhaoand Frey 2004)Foranumberof

B=1000 iterations a bootstrap sample (ie sampling with replacement)

Fig 1 Illustration of the likelihood of quantitative data (left) versus interval-censored

data (right)







later on








































22 Data sets










LOQ after censoring





















second LOD


























original data set










Table 1




54 lt004

2 lt100

26 004minus10

1 15

8 004minus100

2 gt100

1 lt1

1 gt1

7 004minus1

1 1-100


























































coef 1047297cient -0302



































100p

















to be censored














Table 2



Data set True

distribution

Sample

parameters





































mean


eth2




residual values




see Fig 5c






















shown in Fig 6a



data points























































































2004)


in Fig 7b































































































(2007)





Acknowledgements





















References



















































later on








































22 Data sets










LOQ after censoring





















second LOD


























original data set










Table 1




54 lt004

2 lt100

26 004minus10

1 15

8 004minus100

2 gt100

1 lt1

1 gt1

7 004minus1

1 1-100


























































coef 1047297cient -0302



































100p

















to be censored














Table 2



Data set True

distribution

Sample

parameters





































mean


eth2




residual values




see Fig 5c






















shown in Fig 6a



data points























































































2004)


in Fig 7b































































































(2007)





Acknowledgements





















References






































































































coef 1047297cient -0302



































100p

















to be censored














Table 2



Data set True

distribution

Sample

parameters





































mean


eth2




residual values




see Fig 5c






















shown in Fig 6a



data points























































































2004)


in Fig 7b































































































(2007)





Acknowledgements





















References


























































Table 2



Data set True

distribution

Sample

parameters





































mean


eth2




residual values




see Fig 5c






















shown in Fig 6a



data points























































































2004)


in Fig 7b































































































(2007)





Acknowledgements





















References





























































mean


eth2




residual values




see Fig 5c






















shown in Fig 6a



data points























































































2004)


in Fig 7b































































































(2007)





Acknowledgements





















References














































































2004)


in Fig 7b































































































(2007)





Acknowledgements





















References








































































































(2007)





Acknowledgements





















References























































(2007)





Acknowledgements





















References





























































Documents

j.ijfoodmicro.2010.01.025.pdf