7
Application of the Central Limit Theorem in microbial risk assessment: High number of servings reduces the Coefcient of Variation of food-borne burden-of-illness Fernando Pérez-Rodríguez a, , Marcel H. Zwietering b a Department of Food Science and Technology, International Campus of Excellence in the AgriFood Sector ceiA3, University of Córdoba, Córdoba, Spain b Laboratory of Food Microbiology, Wageningen University, Wageningen, The Netherlands abstract article info Article history: Received 16 June 2011 Received in revised form 27 November 2011 Accepted 4 December 2011 Available online 13 December 2011 Keywords: Quantitative risk assessment Central Limit Theorem Public Health Monte-Carlo analysis Listeriosis Salmonellosis The Central Limit Theorem (CLT) is proposed as a means of understanding microbial risk in foods from a Public Health perspective. One variant of the CLT states that as the number of random variables, each with a nite mean and variance, increases (), the distribution of the sum (or mean) of those variables approximates a normal distribution. On the basis of the CLT, the hypothesis introduced by this paper states that the Coef- cient of Variation (CV) of the annual number of food-borne illness cases decreases as a result of a larger num- ber of exposures (or servings) (n). Second-order Monte-Carlo analysis and classical statistics were used to support the hypothesis, based on existing risk models on Listeria monocytogenes in deli meat products focused on elderly people in the United States. Likewise, the hypothesis was tested on epidemiological data of annual incidence of salmonellosis and listeriosis in different countries (i.e. different n). Although different sources of error affected the accuracy of the results, both the Monte-Carlo analysis (in silico) and epidemiological data (in vivo), especially for salmonellosis, demonstrated that the CV of the annual number of cases decreased as n increased as stated by the CLT. Furthermore, results from this work showed that classical statistical methods can be helpful to provide reliable risk estimates based on simple and well-established statistical principles. © 2011 Elsevier B.V. All rights reserved. 1. Introduction Public Health Surveillance systems are intended to record the occurrence of diseases or intoxications as caused by pathogens and toxicants present in foods, to analyze epidemiological data and to disseminate information. The information provided by sur- veillance systems is crucial for designing and implementing strate- gies and/or interventions to minimize food-borne illness. However, burden-of-illness estimates often pose uncertainty due to several important limitations: test sensitivity, underreported cases, deciencies in reporting systems, scarce human resources, etc. Despite these important sources of uncertainty, variations in the number of annual cases are small when compared to the variation in pathogen doses which in most cases can span several orders of mag- nitude. In this line of reasoning, Pérez-Rodríguez et al. (2007) reported that Listeria monocytogenes concentrations in deli-meat products at consumption (doses) could oscillate between 2 log CFU/g and 8 log CFU/g (10 orders of magnitude). However, looking at the num- ber of annual listeriosis cases in the US during the period 19962003, the average percentage change was only 21% (less than 1 logarithm) (Centers for Disease Control and Prevention, CDC, 2004). Also, Powell et al. (2001) estimated, by Monte-Carlo analysis, that the uncertainty interval (95%) for the burden-of-illness by Escherichia coli O157:H7 in the US was of 50,000120,000 cases (mean=75,000). Uncertainty and variability of pathogenic microorganism concentration are consid- ered as an important factor in risk assessment. Several studies were car- ried out to investigate this issue (Gale, 2005; Nauta, 2000). Nevertheless, in some cases, although risk factors can present a wide variation range (e.g., initial hazard concentration), only specic regions or intervals within the factors are responsible for illness (Pérez- Rodríguez et al. 2007), especially if the infection probability is propor- tional to the pathogenic microorganism dose (Zwietering, 2009). For example, Pérez-Rodríguez et al. (2007) observed that the number of cases by L. monocytogenes in RTE meat varied between simulations (with different seeds), and certain specic zones in the distributions of initial concentration and temperature at home were responsible for most variations in cases among simulations. This result leads us to con- sider that, although concentration and other risk factors can be impor- tant sources of uncertainty and variability in risk assessment, not all of this variation will be expected to equally affect or propagate in the over- all risk. In this work, we aimed to introduce the idea that although there may be considerable variation between individual risks, the annual var- iation of the total risk (number of cases) will be small as a result of the Central Limit Theorem (CLT) of probability theory. This theorem can be considered the cornerstone for understanding collective phenomena (Sornette, 2006). The CLT states that as the number of random variables increases towards innity, the distribution of the sum (or mean) of International Journal of Food Microbiology 153 (2012) 413419 Corresponding author. Tel.: + 34 957212057; fax: + 34 957 212000. E-mail address: [email protected] (F. Pérez-Rodríguez). 0168-1605/$ see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.ijfoodmicro.2011.12.005 Contents lists available at SciVerse ScienceDirect International Journal of Food Microbiology journal homepage: www.elsevier.com/locate/ijfoodmicro

Application of the Central Limit Theorem in microbial risk assessment: High number of servings reduces the Coefficient of Variation of food-borne burden-of-illness

Embed Size (px)

Citation preview

Page 1: Application of the Central Limit Theorem in microbial risk assessment: High number of servings reduces the Coefficient of Variation of food-borne burden-of-illness

International Journal of Food Microbiology 153 (2012) 413–419

Contents lists available at SciVerse ScienceDirect

International Journal of Food Microbiology

j ourna l homepage: www.e lsev ie r .com/ locate / i j foodmicro

Application of the Central Limit Theorem in microbial risk assessment: High numberof servings reduces the Coefficient of Variation of food-borne burden-of-illness

Fernando Pérez-Rodríguez a,⁎, Marcel H. Zwietering b

a Department of Food Science and Technology, International Campus of Excellence in the AgriFood Sector ceiA3, University of Córdoba, Córdoba, Spainb Laboratory of Food Microbiology, Wageningen University, Wageningen, The Netherlands

⁎ Corresponding author. Tel.: +34 957212057; fax: +E-mail address: [email protected] (F. Pérez-Rodrígue

0168-1605/$ – see front matter © 2011 Elsevier B.V. Alldoi:10.1016/j.ijfoodmicro.2011.12.005

a b s t r a c t

a r t i c l e i n f o

Article history:Received 16 June 2011Received in revised form 27 November 2011Accepted 4 December 2011Available online 13 December 2011

Keywords:Quantitative risk assessmentCentral Limit TheoremPublic HealthMonte-Carlo analysisListeriosisSalmonellosis

The Central Limit Theorem (CLT) is proposed as ameans of understandingmicrobial risk in foods from a PublicHealth perspective. One variant of the CLT states that as the number of random variables, each with a finitemean and variance, increases (→∞), the distribution of the sum (or mean) of those variables approximatesa normal distribution. On the basis of the CLT, the hypothesis introduced by this paper states that the Coeffi-cient of Variation (CV) of the annual number of food-borne illness cases decreases as a result of a larger num-ber of exposures (or servings) (n). Second-order Monte-Carlo analysis and classical statistics were used tosupport the hypothesis, based on existing risk models on Listeria monocytogenes in deli meat products focusedon elderly people in the United States. Likewise, the hypothesis was tested on epidemiological data of annualincidence of salmonellosis and listeriosis in different countries (i.e. different n). Although different sources oferror affected the accuracy of the results, both the Monte-Carlo analysis (in silico) and epidemiological data(in vivo), especially for salmonellosis, demonstrated that the CV of the annual number of cases decreased asn increased as stated by the CLT. Furthermore, results from this work showed that classical statistical methodscan be helpful to provide reliable risk estimates based on simple and well-established statistical principles.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

Public Health Surveillance systems are intended to record theoccurrence of diseases or intoxications as caused by pathogensand toxicants present in foods, to analyze epidemiological dataand to disseminate information. The information provided by sur-veillance systems is crucial for designing and implementing strate-gies and/or interventions to minimize food-borne illness.However, burden-of-illness estimates often pose uncertainty dueto several important limitations: test sensitivity, underreportedcases, deficiencies in reporting systems, scarce human resources, etc.Despite these important sources of uncertainty, variations in thenumber of annual cases are small when compared to the variationin pathogen doses which in most cases can span several orders ofmag-nitude. In this line of reasoning, Pérez-Rodríguez et al. (2007) reportedthat Listeria monocytogenes concentrations in deli-meat products atconsumption (doses) could oscillate between −2 log CFU/g and8 log CFU/g (10 orders of magnitude). However, looking at the num-ber of annual listeriosis cases in the US during the period 1996–2003,the average percentage change was only 21% (less than 1 logarithm)(Centers for Disease Control and Prevention, CDC, 2004). Also, Powellet al. (2001) estimated, by Monte-Carlo analysis, that the uncertainty

34 957 212000.z).

rights reserved.

interval (95%) for the burden-of-illness by Escherichia coli O157:H7 inthe US was of 50,000–120,000 cases (mean=75,000). Uncertaintyand variability of pathogenic microorganism concentration are consid-ered as an important factor in risk assessment. Several studies were car-ried out to investigate this issue (Gale, 2005; Nauta, 2000).Nevertheless, in some cases, although risk factors can present a widevariation range (e.g., initial hazard concentration), only specific regionsor intervals within the factors are responsible for illness (Pérez-Rodríguez et al. 2007), especially if the infection probability is propor-tional to the pathogenic microorganism dose (Zwietering, 2009). Forexample, Pérez-Rodríguez et al. (2007) observed that the number ofcases by L. monocytogenes in RTE meat varied between simulations(with different seeds), and certain specific zones in the distributionsof initial concentration and temperature at home were responsible formost variations in cases among simulations. This result leads us to con-sider that, although concentration and other risk factors can be impor-tant sources of uncertainty and variability in risk assessment, not all ofthis variationwill be expected to equally affect or propagate in the over-all risk.

In this work, we aimed to introduce the idea that although theremay be considerable variation between individual risks, the annual var-iation of the total risk (number of cases) will be small as a result of theCentral Limit Theorem (CLT) of probability theory. This theorem can beconsidered the cornerstone for understanding collective phenomena(Sornette, 2006). The CLT states that as the number of random variablesincreases towards infinity, the distribution of the sum (or mean) of

Page 2: Application of the Central Limit Theorem in microbial risk assessment: High number of servings reduces the Coefficient of Variation of food-borne burden-of-illness

414 F. Pérez-Rodríguez, M.H. Zwietering / International Journal of Food Microbiology 153 (2012) 413–419

those variables approximates the normal distribution asymptotically.Consequently, the higher the number of exposures (doses) to the path-ogenic microorganism, the lower the uncertainty on the number ofattendant annual cases. From that, it could be expected that patho-genic microorganismswith high prevalence in foods such as L. mono-cytogenes or Salmonella spp. show a decreasing trend in the illnessincidence variability as population size increases (related to numberof exposures). In the present work, this hypothesis was validated viaepidemiological data, analytical calculation and Monte-Carlo analysis.

2. Material and methods

2.1. Key elements in risk assessment

The exposure to a pathogenic microorganism (i.e. microorganismconcentration distribution in food) is generally well-described byusing the lognormal distribution which is typically defined by pa-rameters mean (log μ) and standard deviation (σ). When concentra-tion values are log-transformed, these follow a normal distributioncharacterized by the parameters geometric mean (μlog) and geometricstandard deviation (σlog) from the log-transformed values. Parametersof both distributions are mathematically related and some formulaeare given to derive parameters from one distribution to the other(Jawitz, 2004). Depending mainly on the type of microorganism, envi-ronmental factors, and food matrix, σ and μ can vary, influencing theposition and shape (skew and kurtosis) of the curve, respectively. Path-ogenic microorganism distributions often have skewed distributions(σ≫μ) with very high concentrations at very low probability.

To characterize risk, the pathogenic microorganism concentrationat the moment of consumption should be combined with a dose–response model. This model describes the probability of illness (or in-fection) as a function of the ingested doses (Haas et al. 1999). Thedose is the amount of a pathogenic microorganism ingested throughthe food. If the dose is log-transformed, the dose–response curve is typ-ically sigmoidal. In many cases, dose–response models show a linearregion (Zwietering, 2009):

Pill ¼ Dr ð1Þ

In the equation, Pill represents the probability of getting ill, while ris the slope of the dose–response model (probability of becoming illby ingestion of one cell) and D pinpoints the pathogen dose at con-sumption (CFU).

Doses are located on the linear region of the dose–response modelwhen either the microorganism has a very low infectivity capacityand/or when the dose is sufficiently low to be in the linear part. Asthis type of model is affected by important sources of uncertainty asconsequence of the experimental limitations and biological variability,the error derived from using the linear equation (i.e. D ∙r) to estimatePill can be reasonably neglected in many cases as long as D ∙rbb0.1. Inthe Listeria case, it is known that with maximal doses=109 CFU,D ∙rbb0.1 (with r=1.85 ∙10−14, [Pérez-Rodríguez et al. 2007]). Fur-thermore, for Listeria doses up to 1 ∙1012 CFU, the deviation betweenthis linear approximation and the exponential model developed inFood Drug Administration (FDA) (2003) differed by less than 1%(Pérez-Rodríguez et al. 2007). As a result of dose–response modelapplication, the individual risk is estimated, which is defined as theprobability of getting ill after consuming a serving (per person).However, in Public Health, decisions are often made based on theoverall risk (i.e. illness incidence in a given population). Likewise,data on overall risk, expressed as total illness cases per annum, areoften reported by national surveillance systems. Nevertheless, froma probabilistic standpoint, the overall risk can be understood as theresult of the sum of individual risk, i.e. the individual probabilitiesof getting ill.

2.2. Burden-of-illness explained by the Central Limit Theorem (CLT)

As stated above, from a probabilistic view, the overall risk distribu-tion can be seen as the sum of n individual risk distributions, with nrepresenting the number of doses or exposures. If n is sufficientlygreat and none of those distributions (e.g., individual risk distributions)dominates the resultant distribution (e.g. the overall risk distribution),the CLT can be applied. Consequently, the overall risk distributionapproximates (asymptotically) a normal distribution with parame-ters n•μ and √n•σ (see Appendix A). It is important to note that al-though CLT conditions cannot be reached exactly, a reasonablegood approximation will be expected in a certain region aroundthe mean whose accuracy will depend on the extent of the devi-ation from the CLT.

2.3. Reduction of Coefficient of Variation (CV) of burden-of-illness basedon the Central Limit Theorem

The annual variation on burden-of-illness based on the Coefficientof Variation (CV), can be calculated using Eq. (2).

CV ¼ σμ

ð2Þ

As can be noted in Eq. (2), the CV is a normalized measure of dis-persion of a probability distribution or data set.

As stated in the previous section, the burden-of-illness can be es-timated as the sum of n individual risk distributions (P). Therefore,as V(P)=σP

2, by applying the properties of variance and mean (seeAppendix A), it can be obtained that the CV of the sum of variableswith equal arithmetic μP and σP is given by Eq. (3).

CV ¼ n−1=2 σP

μPð3Þ

If the decimal logarithm is applied to Eq. (3), it can be rewritten asa linear function as follows:

log10 CVð Þ ¼ −0:5 log10 nð Þ þ log10σP

μP

� �ð4Þ

where value −0.5 corresponds with the slope (m) and log σPμP

� �with

the intercept (y) of the log-linear model:

log10 CVð Þ ¼ m log10 nð Þ þ y ð5Þ

2.4. Strategies to study burden-of-illness as sum of risk distributions

With the aim of getting insight into the idea introduced in thiswork, an extensive probabilistic exposure assessment model onL. monocytogenes in deli meat products focused on the elderly peoplein the United States (Centers for Disease Control and Prevention, CDC,2004) was used to estimate the number of annual cases by usingMonte-Carlo analysis. In addition, epidemiological data about annualincidence of salmonellosis and listeriosis were collected from differentsurveillance systems. Both epidemiological data and Monte-Carlo anal-ysis were employed to test if the CV of burden-of-illness reduces as thenumber of exposures (n) increases as stated by the CLT.

2.4.1. Monte-Carlo analysisConcentration at retail taken from Chen et al. (2003) was the ini-

tial input in the exposure assessment model previously developed byPérez-Rodríguez et al. (2007) (see Table 1). After simulation, concen-tration at consumption in contaminated servings, i.e. doses, wasobtained. The dose–response model modified by Pérez-Rodríguez etal. (2007) was applied in the linear region (r=1.85 ∙10−14) and the

Page 3: Application of the Central Limit Theorem in microbial risk assessment: High number of servings reduces the Coefficient of Variation of food-borne burden-of-illness

Table 1Inputs used to estimate the annual burden-of-illness of listeriosis in elderly population.

Variables Type Distribution/model Source

Concentration at retail (log CFU/g) Input Normal (−2.27; 2.24)a Chen et al. (2003)Growth at home Input FDA/USDA 2003 model Pérez-Rodríguez et al. (2007)Serving size (g) Input 64 (mean) Pérez-Rodríguez et al. (2007)Doses distribution (log CFU/serving)b Output Normal (2.58; 2.07) Output from simulated Food Drug Administration (FDA) (2003) modelNumber of contaminated deli-meat servings Input 5.11 ∙107c Food Drug Administration (FDA) (2003)r-value (elderly population) Input 1.85 ∙10−14 (mean) Pérez-Rodríguez et al. (2007)

a Normal (μ; σ).b Doses distribution for contaminated servings obtained by simulating the initial concentration at retail and growth at home.c Estimation based on 20% total servings consumed by elderly and 1.8% being contaminated.

415F. Pérez-Rodríguez, M.H. Zwietering / International Journal of Food Microbiology 153 (2012) 413–419

probability of getting ill was estimated by using the concentration atconsumption and a point-estimate of serving size which corre-sponded to the mean value (64 g). The study by Pérez-Rodríguez etal. (2007) indicated that using a point-estimate for serving size doesnot significantly affect risk estimation. To estimate the number of an-nual cases (i.e., overall risk), the resultant distribution of individualprobabilities of getting ill (individual risk) was summed n times byan iterative process (using Monte-Carlo analysis). The value of n cor-responded to the number of contaminated servings consumed by theelderly population in the US which was equal to 5.11 ∙107 servings.This value was obtained based on the total number of servings con-sumed by the elderly population (2.84 ∙109) and the prevalence of L.monocytogenes in RTE cooked meat products (1.8%), whose valueswere taken from the studies by Food Drug Administration (FDA)(2003) and Chen et al. (2003), respectively. As the computing re-sources (i.e. RAM memory) were insufficient to perform the simula-tion with the highest number of iterations, i.e. n=5.11 ∙107, thetotal number of cases was extrapolated from the number of casessimulated at n=107 based on the trend shown by each statistical pa-rameter (see Fig. 2). For instance, the mean number of cases was es-timated by multiplying by 5.11.

2.4.2. Analytical method: application of the Central Limit TheoremThe initial input was the doses distribution taken from the Monte-

Carlo analysis. The doses distribution (log10 CFU/serving) was de-scribed by a normal distribution with parameters μlogD and σlogD asshown in Table 1. The dose–response model was defined by a straight-line (r=1.85 ∙10−14), therefore calculations of the probability of gettingill could be performed by applying the properties of mean and varianceon the normal distribution of doses (see Appendix A). If the logarithm isapplied to the dose–response model in Eq. (1), then:

log10 Pillð Þ ¼ log10 rð Þ þ log10 Doseð Þ ð6Þ

Table 2Comparison between the Monte-Carlo (MC) analysis and Central Limit Theorem (CLT) for tposures (n).

n Mean Standard Deviation

MC CLT MC CLT

10 4.59 ∙10−6 1.47 ∙10−6 2.39 ∙10−4 4.13 ∙1102 6.98 ∙10−5 1.47 ∙10−5 3.94 ∙10−3 1.31 ∙1103 4.72 ∙10−4 1.47 ∙10−4 8.71 ∙10−3 4.13 ∙1104 4.91 ∙10−3 1.47 ∙10−3 4.22 ∙10−2 1.31 ∙1105 5.34 ∙10−2 1.47 ∙10−2 4.20 ∙10−1 4.13 ∙1106 5.50 ∙10−1 1.47 ∙10−1 1.66 1.31107 5.66 1.47 8.27 4.13a5.11 ∙107 2.83 ∙10 7.53 4.22 ∙10 9.34

a Due to computational restrictions, total number of cases for n=5.11 ∙107 were estimamean number of cases was estimated by multiplying by 5.11).

Since the variance remains unchanged when a scalar value isadded to a random variable, the distribution of log10 (Pill) denotedby F(Log (Pill)) can be estimated according to the following expression:

F log10 Pillð Þð Þ ¼ N μ logD þ log10 rð Þ;σ logD

� �ð7Þ

For the sake of clarity, the distribution of log10 (Pill) was rewrittenas N(μlogP, σlogP), where μlogP=μlogD+log10(r) and σlogP=σlogD.

Arithmetic μP and σP from the distribution of log10 (Pill) were esti-mated by using Eqs. (8) and (9), respectively, which were derivedfrom the moment generating function of the lognormal distribution(Jawitz, 2004).

log10 μPð Þ ¼ μ logP þ12σ2

logP· ln 10ð Þ ð8Þ

log10 σ2P

� �¼ 2· log10 μPð Þ þ log10 eσ

2logP· ln 10ð Þ2−1

� �ð9Þ

Finally, based on CLT, the distribution of the number of cases of lis-teriosis can be approximated as the sum of n N(μP, σP):

F cases=yearð Þ ¼ N n � μP ;√n � σPÞ� ð10Þ

n being the number of exposures (i.e. contaminated servings). Thevalue for n was the same as that used by the Monte-Carlo analysis(n=5.11 ∙107).

2.4.3. Epidemiological data analysisIncidence data of the food-borne diseases by Salmonella spp. and

L. monocytogenes in different countries around the world were col-lected from international and national surveillance system databases.As it was not possible to obtain data on the number of exposuresto the pathogens (n), population size was used instead, which is

he number of cases of listeriosis at different numbers of contaminated servings or ex-

95th Coefficient of Variation

MC CLT MC CLT

0−3 1.68 ∙10−3 6.80 ∙10−3 5.21 ∙10 2.80 ∙1030−2 2.68 ∙10−4 2.15 ∙10−2 5.65 ∙10 8.87 ∙1020−2 2.23 ∙10−3 6.81 ∙10−2 1.85 ∙10 2.80 ∙1020−1 1.71 ∙10−1 2.16 ∙10−1 8.58 8.87 ∙100−1 1.19 ∙10−1 6.94 ∙10−1 7.86 2.80 ∙10

1.11 2.30 3.02 8.879.64 8.27 1.46 2.804.92 ∙10 2.29 ∙10 1.66 1.24

ted by extrapolation based on the trend shown by each statistical parameter (e.g. the

Page 4: Application of the Central Limit Theorem in microbial risk assessment: High number of servings reduces the Coefficient of Variation of food-borne burden-of-illness

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 1 2 3 4 5 6 7 8 9

log

(CV

)

log (n)

Fig. 1. Representation on logarithmic scale (log10) of the Coefficient of Variation (CV) ofannual burden-of-illness obtained by the Monte-Carlo analysis at different numbers ofexposures (n). Solid line corresponds to the CV trend based on the Central Limit Theo-rem (CLT) and the dashed line represents the fitted Eq. (3) to Monte-Carlo analysisdata in range n=105–107.

-5

-4

-3

-2

-1

0

1

2

0 1 2 3 4 5 6 7 8

log

(Sta

ndar

d de

viat

ion)

log (n)

-7

-6

-5

-4

-3

-2

-1

0

1

2

0 1 2 3 4 5 6 7 8

log

(Mea

n)

log (n)

A

B

-7

-6

-5

-4

-3

-2

-1

0

1

2

0 1 2 3 4 5 6 7 8

log

(95t

h pe

rcen

tile)

log (n)

C

Fig. 2. Representation on logarithmic scale (log10) of the standard deviation (A), mean(B) and 95th percentile (C) of annual burden-of-illness obtained by Monte-Carlo anal-ysis at different numbers of exposures (n). Solid line corresponds to the trend of eachparameter based on the Central Limit Theorem (CLT).

416 F. Pérez-Rodríguez, M.H. Zwietering / International Journal of Food Microbiology 153 (2012) 413–419

probably associated with the number of exposures. The selectionof countries was based on criteria of population size (~107–108)and reliability of the surveillance system. When possible, data weretaken from the same source in order to avoid additional sources ofvariation. Incidence data were expressed as confirmed number ofcases per year. The CV was calculated for each country in the period2002–2007 based on the mean population in that same period.

2.5. Comparing Monte-Carlo and epidemiological data to the CentralLimit Theorem

In order to study the effect of n on the CV of the annual burden-of-illness, a range of different n values was used (102–108). The CV was,therefore, calculated by using the μ and σ of the distribution of annualcases obtained at each n and plotted in each type of analysis. The re-lationship between CV and n obtained by Monte-Carlo analysis andepidemiological data were compared to the analytical method, i.e.the CLT represented mathematically by Eq. (5) which was fitted byregression analysis to data derived from both types of analysis.

2.6. Simulation method

To derive the distribution of number of cases (overall risk) in theMonte-Carlo analysis approach, the distribution of individual proba-bilities of getting ill expressed per serving (i.e. individual risk) wassummed n times (i.e. total number of exposures) by an iterative pro-cess. This process was performed by simulating, by Monte-Carlo anal-ysis, the individual risk distribution n times with 10,000 iterationseach time and using different Random Number Generator seeds ineach simulation This procedure led to an elevated number of itera-tions which was maximum for n=107, with 1011 iterations. Simula-tions were performed on a Pentium IV 1.2 GHz Personal Computerusing the Monte-Carlo method implemented in the MATLAB 7.7.0Software (The MathWorks Inc. 2008).

In the case of the Monte-Carlo analysis, the truncate function usedto describe a constrained doses distribution (0–9 log CFU/serving)was provided by @risk software (Palisade ©, Ithaca, USA). With re-spect to the analytical method, truncated mean and standard devia-tion for the doses distribution were derived by applying thealgorithm written in R (Development Core Team 2006, http://www.R-project.org/) by Nadarajah and Kotz (2006).

The regression analysis applied to epidemiological and simula-tion data was carried out using Excel 2003 software (Microsoft ®,Redmond, Wash.).

3. Results

3.1. Risk estimation: listeriosis cases per year

Monte-Carlo analysis resulted in a mean estimate of 29 annualcases of listeriosis, with 49 cases as 95th percentile. On the otherhand, the analytical method based on the CLT obtained a lower num-ber of cases with a mean and 95th percentile of 7 and 23 cases/year,respectively. In both approaches, the doses distribution was con-strained to be in the range 1–109 CFU/serving in order to preventunrealistic doses (i.e. b1 CFU/serving). Table 2 shows the main sta-tistics for estimated number of listeriosis cases at different numbersof exposures (n) for both approaches. Data revealed that both ap-proaches converged to similar values as n increased. Mean valueswere very similar for both approaches at lower n. However, stan-dard deviation and 95th percentile values required a higher numberof exposures (n) to converge. Results indicated that both approachescould be equivalent to estimate risk provided some requirements bemet such as linearity in the dose–response model and normality inthe microbial concentration distribution as given in this example.

Page 5: Application of the Central Limit Theorem in microbial risk assessment: High number of servings reduces the Coefficient of Variation of food-borne burden-of-illness

Table 3Regression parameters and statistics describing log-linear decrease of the Coefficient of Variation (Eq. (3)a) fitted to data obtained from Monte-Carlo analysis and epidemiologicaldata at different numbers of exposures (n).

Illness Data m Standard Error p-value Lower 95% Upper 95% R2 R2(m=−0.5)b

Listeriosis Monte-Carlo −0.37 0.06 b0.01 −0.51 −0.22 0.81 0.61Listeriosis Epidemiological −0.20 0.07 0.11 −0.56 0.15 0.38 0.00Salmonellosis Epidemiological −0.65 0.12 0.01 −1.08 −0.21 0.81 0.76

a log CVð Þ ¼ −0:5 log10 nð Þ þ log10σPμP

� �.

b m parameter was fixed to −0.5.

417F. Pérez-Rodríguez, M.H. Zwietering / International Journal of Food Microbiology 153 (2012) 413–419

3.2. High number of exposures reduces the uncertainty on burden-of-illness

As aforementioned, the CV of the burden-of-illness declines as aconsequence of an increase of number of exposures. This phenome-non derived from applying the CLT is mathematically describedthrough a log-linear function given by Eq. (5) (with a slope m=−0.5)which is represented graphically in Fig. 1. Table 2 represents mean andstandard deviation of the numbers of cases together with the calculatedCV derived from the Monte-Carlo analysis and CLT obtained at differentnumbers of exposures (n).

3.2.1. Behavior of the CV of the burden-of-illness at increasing n estimatedby Monte-Carlo analysis

Overall, when different numbers of exposures (n) were studied byMonte-Carlo analysis, CV decreased as n becomes higher (Fig. 1) ex-cept for very low numbers of exposures (10–100) which did notshow a clear decreasing trend. Generally, Monte-Carlo analysis doesnot yield reliable estimates when low numbers of samples are simu-lated since standard deviation of the simulated distribution has notyet stabilized (Vose, 2000). This known pitfall inherent to Monte-Carlo analysis could explain, in part, why at low number of exposures,the CV, which is related to standard deviation (Eq. (2)), did not pre-sent the expected pattern. When compared to the CLT (i.e. Eq. (5)),it could be observed that CV values from Monte-Carlo analysis werelocated far away from those derived by CLT even though a similartrend was observed. These discrepancies between the two approacheswere progressively reduced as the number of exposures increased(n≥106). Convergence between both approaches is not a fact whichcan be observed at relatively low number of exposures because of,according to the CLT, normality for sum of variables is met when n ap-proximates infinity, i.e. n becomes enormously high. A heavy-tailed dis-tribution like the lognormal distribution used in the simulation todescribe the doses requires much larger n to approximate the normaldistribution. This phenomenon can be easily evidenced in Fig. 2A inwhich standard deviations by Monte-Carlo analysis and the CLT arerepresented together. Standard deviations are quite different at thebeginning, when n is low; however, as n increases, values for bothapproaches become similar which is consistent with that observed

Table 4Epidemiological data used to estimate the Coefficient of Variation (CV) of the number of ca

Year France USA The Nethe

2002 220 665 322003 209 696 522004 236 753 552005 221 896 912006 290 884 642007 319 808 68Mean 249 784 60SDb 45 96 20CV 0.18 0.12 0.32Population 6.22 ∙107 2.90 ∙108 1.62 ∙107

Center for Disease Control and Prevention. Morbidity and Mortality Weekly Report Availaba European Centre for Disease Prevention and Control (ECDC) and the European Food Sa

food-borne outbreaks in the European Union (2002–2007). Available in http://www.efsa.eub Standard deviation.

for the CV in Fig. 1, following the existing mathematical relationshipbetween both statistical parameters (see Eq. (1)). In turn, for themean, as expected, both approaches converged rapidly, since theCLT is especially effective for the central zone of the distributions(Fig. 2B). In the case of 95th percentile, the behavior was similar tothat shown by the standard deviation, which means that the 95thpercentile only approximated the CLT at high number of exposuresas shown in Fig. 2C. Moreover, this statistic presented major varia-tion, probably due to its great dependence on the sampling in thedistribution tails.

The trend for CV estimated byMonte-Carlo analysis was comparedto Eq. (3) through regression analysis. The estimated slope was m=−0.25 which was much higher than the theoretical value m=−0.5,derived from the CLT. As convergence is expected at high n, regres-sion was again performed but exclusively at high n levels (105–

107). In this case, although the regression line did not show a perfectconcordance with the CLT trend (Fig. 1), the estimated slope wasmuch closer to the value predicted by the CLT (i.e., m=−0.37). Like-wise, the confidence interval for slopem included−0.5 (see Table 3),thereby suggesting the possibility that the results of the Monte-Carloanalysis are in agreement with the CLT assumption.

Based on these results, the Monte-Carlo analysis could confirmthat the CV of burden-of-illness decreased, according to CLT, at in-creasing n values (number of exposures), although the range of valid-ity for the hypothesis based on the CLT should be set for high n(≥106).

3.2.2. Behavior of the CV of food-borne burden-of-illness reported bydifferent countries

Although the hypothesis might be tested using Monte-Carlo anal-ysis, it does not imply by itself that a hypothesis supported by theCLT reflects a phenomenon occurring in the real world. Hence, anattempt was made to test the hypothesis by analyzing epidemiolog-ical data considering the illness cases from both sporadic events andoutbreaks. For that, CVs calculated from the real number of annualcases of listeriosis and salmonellosis were represented against pop-ulation size of different selected countries (see Tables 4 and 5, re-spectively). Fig. 3 shows clearly a log-linear decrease for the calculatedCVs for both food-borne diseases which was in agreement with the

ses of listeriosis reported in period 2002–2007a.

rlands Belgium UK Germany

44 150 24076 243 25689 232 29662 223 51067 209 50857 261 35666 220 36116 38 1210.24 0.17 0.341.03 ∙107 6.00 ∙107 8.23 ∙107

le at www.cdc.gov/mmw.fety Authority (EFSA). Reports on trends and sources of zoonoses, zoonotic agents andropa.eu/en/scdocs/scpublications.htm.

Page 6: Application of the Central Limit Theorem in microbial risk assessment: High number of servings reduces the Coefficient of Variation of food-borne burden-of-illness

Table 5Epidemiological data used to estimate the Coefficient of Variation (CV) of the number of cases of salmonellosis reported in period 2002–2007a.

Year France USA The Netherlands Belgium UK Germany

2002 6575 44,264 1588 3630 16,547 72,3772003 6199 43,657 2142 4916 18,069 63,0442004 6352 42,197 1520 9545 14,809 59,9472005 5877 45,322 1388 12,894 12,784 52,2452006 6339 45,808 1667 9753 14,055 52,5752007 5510 47,995 1245 3973 13,104 55,400Mean 6142 44,874 1592 7452 14,895 59,265SDb 386 1992 308 3806 2058 7690CV 0.06 0.04 0.19 0.51 0.14 0.13Population 6.22 ∙107 2.90 ∙108 1.62 ∙107 1.03 ∙107 6.00 ∙107 8.23 ∙107

Center for Disease Control and Prevention. Morbidity and Mortality Weekly Report Available in www.cdc.gov/mmw.a European Centre for Disease Prevention and Control (ECDC) and the European Food Safety Authority (EFSA). Reports on trends and sources of zoonoses, zoonotic agents and

food-borne outbreaks in the European Union (2002–2007). Available in http://www.efsa.europa.eu/en/scdocs/scpublications.htm.b Standard deviation.

418 F. Pérez-Rodríguez, M.H. Zwietering / International Journal of Food Microbiology 153 (2012) 413–419

CLT. Furthermore, data-points in Fig. 3 had a high variance. This fact wasexpected since surveillance systems are recognized to pose importantsources of variation. Apart from this, the different consumption pat-terns (i.e., number of servings and sizes) between countries could be anadditional source of variability affecting data-points. By comparison toEq. (3) with regression analysis, results showed a reasonable ap-proximation to the CLT which could be confirmed looking at confi-dence intervals for m, and the coefficient of determination (R2)(see Table 3). In comparison to listeriosis, salmonellosis (Fig. 3)was fitted better by Eq. (3), thereby presenting a major similarityto the CLT. Values of the coefficient of determination (R2=0.81) in-dicated a significant log-linear decrease, together with m very near

y =-0.6471x + 4.0953 R² = 0.8085

-1.60

-1.40

-1.20

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

6.00 6.50 7.00 7.50 8.00 8.50 9.00

log

(CV

)

log (Population size)

y =-0.203x + 0.895 R² = 0.3839

-1.60

-1.40

-1.20

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

6.00 6.50 7.00 7.50 8.00 8.50 9.00

log

(CV

)

log (Population size)

A

B

Fig. 3. Regression lines for the decimal logarithm of the Coefficient of Variation (CV)versus decimal logarithm of the annual number of cases of salmonellosis (A) and liste-riosis (B) based on Eq. (3) for the period 2002–2007 in selected countries with differ-ent population sizes.

to −0.5 (−0.65). Besides, it should be noted that the confidence in-terval (C.I.) includes−0.5 (95% C.I:−1.08 to−0.21) and in addition,when m was fixed to −0.5, the regression was still acceptable(R2=0.76). With regard to the annual cases of listeriosis, the log-linear decrease was not significant as shown by the t-test reportedin Table 3 (P=0.1). When m was fixed to −0.5, a significant R2

could not be obtained, thoughm=−0.5 was within the C.I. obtainedby the regression analysis (95% C.I:−0.56 to 0.15). This worse fittingby Eq. (3) could be due to the higher dispersion shown by the data.Apart from the probable sources of variation affecting the reportedlisteriosis cases, it could be put forward as a possible explanationthat the real number of exposures was not large enough to meetthe CLT. However, as the real number of exposures to L. monocyto-genes (contaminated servings) is unknown, such a hypothesis ishard to test. Accordingly, these results should not be thought to beconclusive as they are strongly dominated by variance and uncer-tainty sources difficult to be isolated or reduced. Nevertheless, theresults did shed insight on the decreasing pattern of the CV ofburden-of-illness when high numbers of exposures take place.

4. Conclusions

In many areas, the Central Limit Theorem is used as a first approachto understanding phenomena from a global perspective (e.g. economicsciences). Interpretation about reality is always complex and generalrules can be helpful to extract basic and useful information. This wasthe main purpose in this work in which an attempt was made tostudy microbiological risk assessment aspects from an angle ofPublic Health. The hypothesis suggested in this work was that “an-nual variation in the number of food-borne illness cases is reducedas the result of major exposure intensity (n)”. Classical statistics,Monte-Carlo analysis, and epidemiological data were successfullycombined to demonstrate that the Central Limit Theorem can bea plausible theoretical basis for such a phenomenon. In general,the results did show a clear decreasing trend in the Coefficient ofVariation of the number of annual cases as n increases. In contrast,the regression analysis applied to the Monte-Carlo analysis and ep-idemiological data did not derive the exact mathematical equationgiven by the Central Limit Theorem, which was especially evidentfor listeriosis epidemiological data. Nevertheless, regression analy-sis confidence intervals indicated a reasonable convergence to TheCentral Limit Theorem. Probably, additional sources of uncertaintycoming from both the simulation methods and food-borne out-breaks reporting systems could be responsible for the lack of accu-racy and precision in the data. On the other hand, the presentstudy shows that classical statistical methods can be helpful to pro-vide sound probabilistic risk estimation based on simple and well-established statistical principles. The methods applied here gavesimilar results to the Monte-Carlo analysis when linear models are

Page 7: Application of the Central Limit Theorem in microbial risk assessment: High number of servings reduces the Coefficient of Variation of food-borne burden-of-illness

419F. Pérez-Rodríguez, M.H. Zwietering / International Journal of Food Microbiology 153 (2012) 413–419

used. Nevertheless, further analysis should be carried out to moreexactly evaluate how classical statistical concepts can be applied tofood microbiological risk assessment, and which mathematical re-quirements or conditions should be met to provide valid risk esti-mations or equivalent results to the Monte-Carlo analysis approach.

Acknowledgments

This work was partly financed by a FEMS Research Fellowship(2009-2) awarded by the Federation of European Microbiology Soci-eties (FEMS); the MICINNAGL2008-03298/ALI; the Excellence ProjectAGR-01879 (Junta de Andalucía) and by the Research Group AGR-170HIBRO of the “Plan Andaluz de Investigación, Desarrollo e Innovación”(PAIDI).

Appendix A. Properties of the mean and variance

According to the properties of variance, when two or more uncor-related variables are summed, the variance of the resultant variablecan be calculated as the sum of the variances of each variable (seeEq. (A1)). If the variables have the same variance, the resultant vari-ance can be calculated by multiplying the variance value by a scalarcorresponding to the number of variables in the sum (see Eqs. (A1)and (A2)).

V ∑ni¼1Xi

� � ¼ ∑ni¼1V Xið Þ ðA1Þ

So, if the variables (Xi) have equal variance, then

V ∑ni¼1Xi

� � ¼ n·V Xið Þ ðA2Þ

Besides, the mean of a sum of variables can be estimated by usingEq. (A3) or (A4) in a similar fashion to variance. It is also important tomention that variance and mean properties (called also method of

moments) hold for any distribution as long as they are finite,independent or non-correlated.

μ ∑ni¼1Xi

� � ¼ ∑ni¼1μ Xið Þ ðA3Þ

μ aXð Þ ¼ a·μ Xð Þ ðA4Þ

References

Centers for Disease Control and Prevention (CDC), 2004. 2003 Surveillance Report. http://www.cdc.gov/foodnet/annual/2003/2003_report.pdf. Accessed February 10, 2010.

Chen, Y., Ross, W.H., Scott, V.N., Gombas, D.E., 2003. Listeria monocytogenes: low levelsequal low risk. Journal of Food Protection 66, 570–577.

Food Drug Administration (FDA), 2003. Quantitative Assessment of Relative Risk to PublicHealth from Foodborne Listeria monocytogenes among Selected Categories of Ready-to-Eat Foods. http://www.foodsafety.gov/~dms/lmr2-toc.html. Accessed August 15,2009.

Gale, P., 2005. Matrix effects, nonuniform reduction and dispersion in risk assessmentfor Escherichia coli O157. Journal of Applied Microbiology 99, 259–270.

Haas, C.N., Rose, J.B., Gerba, C.P., 1999. Quantitative Microbial Risk Assessment. JohnWiley & Sons, Inc., New York, NY.

Jawitz, J.W., 2004. Moments of truncated continuous univariate distributions. Advancesin Water Resources 27, 269–281.

Nadarajah, S., Kotz, S., 2006. R Programs for computing truncated distributions. Journalof Statistical Software 16, 1–8.

Nauta, M.J., 2000. Separation of uncertainty and variability in quantitative microbialrisk assessment models. International Journal of Food Microbiology 57, 9–18.

Pérez-Rodríguez, F., van Asselt, E.D., García-Gimeno, R.M., Zurera, G., Zwietering, M.H.,2007. Extracting additional risk managers information from a risk assessment ofListeria monocytogenes in deli meats. Journal of Food Protection 70, 1137–1152.

Powell, M., Ebel, E., Schlosser, W., 2001. Considering uncertainty in comparing the bur-den of illness due to foodborne microbial pathogens. International Journal of FoodMicrobiology 69, 209–215.

Sornette, D., 2006. Critical Phenomena in Natural Sciences: Chaos, Fractals, Selforgani-zation and Disorder: Concepts and Tools. Springer-Verlag Heidelberg, New York.

Vose, D., 2000. Risk Analysis: A Quantitative Guide. John Wiley & Sons, Inc., New York,NY.

Zwietering, M.H., 2009. Quantitative risk assessment: is more complex always better?Simple is not stupid and complex is not always more correct. International Journalof Food Microbiology 134, 57–62.