Reply to comments on “Derivation of numerical values for the World Health Organization guidelines for recreational waters”

ARTICLE IN PRESS

Available at www.sciencedirect.com

WAT E R R E S E A R C H 4 0 ( 2 0 0 6 ) 1 9 2 1 – 1 9 2 5

0043-1354/$ - see frodoi:10.1016/j.watres

�Corresponding aE-mail address: d

journal homepage: www.elsevier.com/locate/watres

Discussion

Reply to comments on ‘‘Derivation of numerical valuesfor the World Health Organization guidelines forrecreational waters’’

David Kaya,�, Nick Ashboltb, Mark D. Wyera, Jay M. Fleisherc, Lorna Fewtrella,Alan Rogersc, Gareth Reesd

aRiver Basin Dynamics and Hydrology Research Group, IGES, University of Wales, Aberystwyth, SY23 3DB, UKbSchool of Civil and Environmental Engineering, University of New South Wales, Sydney, AustraliacCentre for Research into Environment and Health, University of Wales, Lampeter, SA48 8HU, UKdAskham Bryan College, York, UK

a r t i c l e i n f o

Article history:

Received 31 August 2005

Received in revised form

6 February 2006

Accepted 15 February 2006

Available online 4 April 2006

Keywords:

Bathing water

Recreational water

Water quality criteria

Standards

Faecal indicators

Enterococci

Coliform

Microbiology

Epidemiology

Evidence based

Randomised protocol

Ecological design

nt matter & 2006 Elsevie.2006.02.009

uthor. Tel./fax: +44 [email protected] (D. Kay).

A B S T R A C T

The contribution addressed reveals an optimistic design philosophy likely to systematically

underestimate risk in epidemiologic studies into the health effects of bathing water exposures.

The authors seem to recommend that data on the ‘exposure’ measure (i.e. water quality) in

such studies should be acquired in a similar manner to that used for regulatory sampling. This

approach may compromise the quality of the epidemiologic investigations undertaken. It may

result in imprecise estimates of exposure because it ignores the fact that regulatory timescales

and spatial resolution (even if artificially compressed to a bathing day) can mask large spatial

and temporal variability in water quality. If this variability is ignored by taking some mean value

and attributing that to all of those exposed in a period at a study location, many bathers may be

misclassified and the studies may be biased to a ‘no-effect’ conclusion. A more appropriate

approach is to maximise the precision of the epidemiologic investigations by measurement of

individual exposure (or water quality) at the place and time of the exposure, as has been done

in randomised volunteer studies in the UK and Germany. The precise epidemiologic

relationships linking ‘exposure’ with ‘illness’ can then be related to the probability of exposure

to particular water quality by a ‘normal bather’ using the known probability distribution of the

exposure variable (i.e. faecal indicator concentration) in the regulated bathing waters. We

suggest that any research protocol where poor sampling design for water quality assessment is

justified because regulatory monitoring is equally imprecise may be fundamentally flawed. The

rationale for this assessment is that the epidemiology is the starting point and evidence-base

for ‘standards’. If precision is not maximised at this stage in the process it compromises the

credibility of the standards design process. The negative effects of the approach advocated in

this ‘comment’ are illustrated using published research findings used to derive the figures

illustrated in Wymer et al. [2005. Comment on derivation of numerical values for the World

Health Organization guidelines for recreational waters. Water Research 39, 2774–2777].

& 2006 Elsevier Ltd. All rights reserved.

r Ltd. All rights reserved.

23565.

ARTICLE IN PRESS

0 100 200 300 400 500

0.0

0.1

0.2

0.3

0.4

0.5

Enterococci density (cfu per 100 ml)

Pro

babi

lity

of g

astr

oent

eriti

s

Boston

New York City

Lake Pontchartrain

Figure 2 – Recalculated dose-response relationships for the

three study sites used in the original UEPA investigations

reported in Cabelli et al. (1982) (from Fleisher 1992, p. 123,

Fig. 9.3).

WAT E R R E S E A R C H 4 0 ( 2 0 0 6 ) 1 9 2 1 – 1 9 2 51922

1. Introduction

Wymer et al. (2005) present an analysis and comment on the

risk models used to underpin the numerical water quality

criteria published in WHO (2003) and which also form the

basis of the ‘good’ standards for intestinal enterococci out-

lined in the draft revisions of the European Union (EU)

Bathing Water Directive (CEC, 2000, 2002, 2004). They calcu-

late the risk from what they term ‘ecological risk’ using the

‘personal exposure’ risk equation published in Kay et al.

(1994). By ‘ecological risk’ they imply some longer-term

measure of water quality for example a compliance measure

which, in the EU, might be 20 samples taken over a bathing

season. By ‘personal risk’ they mean the water quality

measured at the time and place of exposure as measured in

the UK epidemiologic studies which employed a randomised

trial protocol (Fleisher et al., 1996; Kay et al., 1994) advocated

previously by WHO (1972).

This allows them to construct Fig. 1 (reproduced below)

which relates the geometric mean enterococci level at a beach

(measured over a period of time) to the excess risk of

gastroenteritis. They make the qualitative observation that

the slopes of the two curves derived from the UK and US

epidemiologic studies appear similar. This claimed ‘similar-

ity’ is further reinforced by the apparently similar relative

risks of the two investigations outlined in Wymer et al.’s

(2005) Fig. 2.

In constructing Fig. 1, Wymer et al. (2005) imply that the US

epidemiologic studies, (Cabelli et al., 1982) used an ‘ecological’

measure of exposure rather than a ‘personal’ level of

exposure.

They go on to state:

1. Although the personal exposure assessment of the

original UK model has theoretical interest, it has little

regulatory or advisory value in its raw form given that

knowledge of a bathers specific exposure level is virtually

unobtainable

and

2. Simply inserting a mean exposure value into the UK

personal exposure model is likely to result in bias in the

EPA HCGI

UK personal

exposure

UK ecologic

exposure

EPA GI

9

1 10 100 1000

Enterococci per 100 mL

Exc

ess

risk

of g

astr

oent

eriti

s

0.001

0.01

0.1

1

0.05

Figure 1 – Predicted excess risk of gastroenteritis (from Wyer

et al., 2005, p. 2775, Fig. 1).

opposite direction, overestimating the increase in overall

risk

they then conclude:

3. Marine and freshwater studies that that have been

conducted by the USEPA were designed to predict expected

incidence of illness given monitoring results that are

available in practice, i.e. mean indicator levels based on

sampling. When a research design utilises these same

water sampling techniques and involves health surveys on

the target population y modelling is simplified.

Minor critical points, such as the assumption of a uniform

standard deviation (SD) for bathing water log10 enterococci

concentration by the WHO (2003) and the lack of confidence

intervals on the original risk model published in Kay et al.

(1994) are also made in this paper.

2. Responses

2.1. The SD assumption

The utilisation of uniform SD is required if a consistent

‘Guideline’ value is to be published (in terms of geometric

mean (GM) or some percentile value). The alternative

approach, which was explored in Wyer et al. (1999), is to set

an ‘acceptable’ risk level of say 5% additional illness. In this

pure ‘risk’ approach, the regulator would set the risk level and

this would be calculated from the standard deviation and

mean log10 faecal indicator value for each beach. Following a

series of consultations and meetings of WHO international

technical advisers between 1996 and 2002, it was decided that

a pure ‘risk’ approach utilising both the GM and SD would

cause confusion and that a single parametric value was

needed if an international ‘Guideline’ was to be published, i.e.

the 95th percentiles (95%ile) for intestinal enterococci out-

lined in Chapter 4 of WHO (2003). The 95%ile 200 intestinal

enterococci cfu 100 ml�1, approximates to a 5% excess illness

rate (which in fact is associated with a 95th percentile of 184

intestinal enterococci cfu 100 ml�1) assuming a SD in log10

intestinal enterococci of 0.8103. This value was derived from

an earlier study of over 11,000 European bathing waters for

ARTICLE IN PRESS

WAT E R R E S E A R C H 40 (2006) 1921– 1925 1923

which 4121,000 enterococci enumerations were available in

1993 and 1994. It is the SD of the log10 values of each

enumeration and is therefore wider than might be expected

as a value for an individual beach. The WHO meeting of

experts in Jersey in 1997 received a sensitivity analysis to the

constant SD assumption; for example, a SD of 0.7 and a 95%ile

of 200 implies an excess GI risk of 6.2%; a SD of 0.6 and a

95%ile of 200 implies a risk of 7.2%.

2.2. Confidence intervals on the logistic regressionfunction

Confidence intervals were not commonly reported in logistic

regression analyses in the medical literature at the time of the

Kay et al. (1994) paper in The Lancet, but analyses including

such confidence intervals have been reported subsequently

(Kay et al., 2001). This point does, however, beg the question

of how the regulatory community deals with such confidence

intervals, should it adopt a precautionary principle and utilise

the upper 95% confidence interval in deriving ‘standards’ or

utilise the logistic function? Most authorities worldwide have

adopted the latter approach as did WHO, again this was

debated during the WHO expert consultations.

2.3. Protocol design philosophy

The points 1–3 listed in Section 1 above appear to make two

criticisms (1 and 2) and suggest a solution (point 3). In fact the

model outlined in Kay et al. (2004) has never been used in this

manner in the standards design process and we would agree

with Wymer et al. (2005) that it would be inappropriate for it

to be used in this manner. To clarify this point, the function

published in the Lancet in 1994 has not been used in its ‘raw’

state in the derivation of Guidelines as implied in point 1 and

it has not been used in the context of a ‘mean’ exposure as

implied in point 2. It predicts the probability of illness from a

single exposure. However, exposure is a ‘probabilistic’ event

that depends on the distribution (i.e. mean and SD of

enterococci in the bathing water) or, in other words, the

probability density function describing water quality. This is

explained in detail in Wyer et al. (1999), Kay et al. (2004) and

WHO (2003). This approach also facilitates calculation of a

disease burden as explained in Kay et al. (2004) and Wyer et al.

(1999).

The third point makes the potentially dangerous suggestion

that epidemiologic studies should apply water quality sam-

pling as required by regulatory agencies to assess exposure.

This is ‘dangerous’ because such studies will underestimate

risk due to systematic misclassification bias. The studies

which underpin the WHO Guidelines sought to maximise

precision of the epidemiologic data by (i) measuring ‘expo-

sure’ (i.e. water quality) as close to the actual individual

bather as possible, and (ii) using a ‘randomised healthy

volunteer’ protocol which facilitated extensive data acquisi-

tion on potential confounding factors. Clearly, the pattern of

data acquisition in such studies is not the same as would be

utilised in a regulatory sampling regime. The spatial and

temporal data are much more detailed if exposure is to be

measured with sufficient accuracy to facilitate credible

logistic regression modelling required to produce the type of

illness exposure relationship reported in Kay et al. (1994) and

Fleisher et al. (1996).

If the acquisition of ‘exposure’ (water quality) data in

epidemiologic studies was to mirror the regime for regulatory

samples (e.g. possibly five samples taken over a month for

regulatory compliance assessment), it would have significant

negative implications for the scientific quality of the evi-

dence-based approach. The reason for this is that the ‘unit of

exposure’ becomes the mean value of water quality over a

relatively long period when, in fact, faecal indicator concen-

trations at bathing beaches vary rapidly, commonly by log10

orders over short distances and durations such as a bathing

day (Crowther et al., 2001; Noble et al., 2003; Whitman and

Nevers, 2004). This is clearly recognised by recent US studies

which have sought to use intensive spatial sampling (at 20 m

intervals), GIS techniques and GPS location of bathers better

to define ‘exposure’ location (Sams et al., 2004). If a daily

mean is sufficient to characterise exposure such spatial

sampling precision would not be needed. Thus, if ‘ecological

data’ are used to define the measure of exposure, as

suggested by Wymer et al. (2005), individual bathers with a

single exposure can be seriously ‘misclassified’ as to their

exposure status, reducing the precision of the exposure—

response models derived from such a flawed experimental

design. The effect of misclassification bias is to increase the

probability of producing a ‘no significant relationship’ con-

clusion.

In reality, previous US investigations did not use ‘regulatory’

periods and sampling protocols, rather they used the ‘bathing

day’ as the unit of exposure generally calculating a geometric

mean faecal indicator concentration for the day and along a

stretch of beach (Cabelli et al., 1982). The large spatial and

temporal variation in faecal indicators during any bathing day

makes this a very imprecise measure of bather exposure with

which to calibrate either a logistic or least-squares regression

model.

2.4. Implications of the Wymer et al. (2005) designapproach; lessons from history

The UK studies reported in Kay et al. (1994) and Fleisher et al.

(1996) adopted a ‘randomised volunteer’ protocol in prefer-

ence to the US ‘prospective cohort’ design of Cabelli et al.

(1982) where bathers and non-bathers are self selecting and

were recruited after the exposure. The latter approach has

been shown to have serious protocol weaknesses which were

outlined in Fleisher (1992). There is insufficient space for a full

critique, but we illustrate this with two examples.

First, the three study locations used in the studies reported

in Cabelli et al. (1982) exhibited very different dose–response

relationships. Fleisher (1992) calculated an exposure—

response relationship for each of the study locations repro-

duced as Fig. 2 below. It is impossible to assess whether this

pattern was produced by extensive misclassification bias but,

in any event, combination of these three very different

relationships to form the scientific basis for standards may

not be appropriate.

Second, the grouping method used in the analysis of the

original data published in Cabelli et al. (1982) may have

produced or affected the characteristics of the reported

ARTICLE IN PRESS

WAT E R R E S E A R C H 4 0 ( 2 0 0 6 ) 1 9 2 1 – 1 9 2 51924

relationships. The original data covered 118 trial days split

into 81 at New York City beaches, 31 at Lake Pontchartrain, LA

and 6 at Boston, MA. These were grouped on the basis of

subjectively identified ‘natural breaks’ into 18 data points

which formed the basis for Cabelli et al.’s analysis. Fleisher

(1992) notes:

It is of considerable interest that three data points were

arbitrarily dropped by the authors of the EPA study in the

regression analysis for Highly Credible GI symptoms when

using the rate ratio as the dependent variable but were

included in the regression analysis that used the rate

difference as the dependent variable. Two of the three data

points that were omitted corresponded to trial clusters

that had no reported GI symptoms among non-swimmers.

(The third was omitted due to an unusually low non-

swimmer rate). An alternative to exclusion would be to use

the average rate of GI symptoms reported among non-

swimmers for the year and location of these two missing

data points. This method would yield expected non-

swimmer rates of 4.5 and 13.3 per thousand for these

two omitted data points. y Table 9.3 shows that, when the

data for Highly Credible symptoms are re-analyzed in this

manner, the regression coefficient changed considerably

and the equation is no longer significant (p40:05).

Although it can be argued that the methods used to derive

the analyses yyyy are also arbitrary, the striking

differences between this analysis and that reported by

the EPA study highlight the enormous effect that can be

caused by minor manipulation of the data. (Fleisher, 1992,

pp. 119–120)

This analysis and those reported in Fleisher (1990) and

Fleisher et al. (1993) casts some doubt on Wymer et al.’s (2005)

Fig. 2, but more importantly illustrates the impact of

excluding a few data items from a subjectively grouped data

set. The potential impacts of alternative, and unbiased,

grouping procedures remain unquantified but may well be

even more significant.

3. Conclusions

There are circumstances where the exposure status (i.e.

bather or non-bather) has to be self-selecting for obvious

reasons such as the required skill level to participate in

certain water sports such as slalom canoeing. In such

circumstances the prospective cohort protocol suggested by

Wymer et al. is probably the most appropriate approach for

epidemiologic studies and it had been applied in such

circumstances in the UK. To that extent, the two approaches

discussed here can be considered complementary.

However, to base the sampling regime for any epidemiolo-

gic study in this area on the approach used for regulatory data

acquisition will always produce very imprecise ‘exposure’

measures in any environment where spatial and temporal

variability in faecal indicator concentrations is the norm. This

makes model calibration difficult and produces bias towards

the ‘no effect’ conclusion. This presents a threat to the

success of recreational water epidemiologic investigations

and, more importantly, it will reduce the potential for credible

health evidence-based standards to result from public

investments in this area.

Attempts to derive such standards from the early US

studies, which applied this protocol and philosophy, have

been flawed for the reasons outlined above and this has been

known since the early 1990s. The WHO expert advisers were

fully aware of these analyses in their deliberations between

1996 and 2002 which led to the Guidelines published in 2003.

The WHO approach employed in the derivation of the 2003

Guidelines can be summarised as follows:

1.
Review the international literature and subject the review
to expert consultation and international peer review, this

was done in 1998 in the International Journal of Epide-

miology.

2.
Choose the most precise studies and facilitate a series of
expert meetings to design evidence-based Guidelines,

whenever possible submitting the process to further

international peer review.

3.
Circulate a draft for further discussion and peer review.
4.
Consult extensively worldwide.
5.
Publish the Guidelines
We feel this process has been undertaken meticulously and

welcome the opportunity to address comment by Wymer

et al. (2005).

R E F E R E N C E S

Cabelli, V.J., Dufour, A.P., McCabe, L.J., Levin, M.A., 1982. Swim-ming-associated gastroenteritis and water-quality. Am. J.Epidemiol. 115, 606–616.

CEC, 2000. Council of the European Communities. Directive 2000/60/EC of the European Parliament and of the Council of 23October 2000 establishing a framework for Community actionin the field of water policy. Official J. Eur.Communit. L327, 1–72.

CEC, 2002. Council of the European Communities. Proposal of theEuropean Parliament and of the Council concerning thequality of bathing waters COM (2002) 581 final, Brussels24.10.02.

CEC, 2004. Council of the European Communities. Amendedproposal for a Directive of the European Parliament and of theCouncil concerning the management of bathing water quality.Brussels, 23rd June 2004.

Crowther, J., Kay, D., Wyer, M.D., 2001. Relationships betweenmicrobial water quality and environmental conditions incoastal recreational waters: The Fylde coast, UK. Water Res.35, 4029–4038.

Fleisher, J., 1990. The effects of measurement error on previouslyreported mathematical relationships between indicator or-ganism density and swimming associated illness, a quantita-tive estimate of the resulting bias. Int. J. Epidemiol. 19,1100–1106.

Fleisher, J., 1992. US Standards: a re-analysis. In: Kay, D. (Ed.),Recreational Water Quality Management. Ellis Horwood,Chichester, UK, pp. 113–127.

Fleisher, J.M., Jones, F., Kay, D., Morano, R., 1993. Settingrecreational water quality Criteria. In: Kay, D., Hanbury, R.(Eds.), Recreational Water Quality Management. Volume 2:Fresh Waters. Ellis Horwood, Chichester.

Fleisher, J.M., Kay, D., Wyer, M.D., Salmon, R.L., Jones, F., 1996.Non-enteric illnesses associated with bather exposure tomarine waters contaminated with domestic sewage: the

ARTICLE IN PRESS

WAT E R R E S E A R C H 40 (2006) 1921– 1925 1925

results of a series of four intervention follow-up studies. Am. J.Public Health 86, 1228–1234.

Kay, D., Fleisher, J.M., Salmon, R.L., Jones, F., Wyer, M.D., Godfree,A.F., Zelenauchjacquotte, Z., Shore, R., 1994. Predicting like-lihood of gastroenteritis from sea bathing—results fromrandomized exposure. Lancet 344, 905–909.

Kay, D., Fleisher, J., Wyer, M.D., Salmon, R.L., 2001. Reanalysis ofthe Sea Bathing Data from the UK Randomised Trials. Reportto the Expert Advisory Committee Comprising, DEFRA, DoH,Environment Agency and PHLS. CREH Aberystwyth, Wales,UK.

Kay, D., Bartram, J., Pruss, A., Ashbolt, N., Wyer, M.D.,Fleisher, J.M., Fewtrell, L., Rogers, A., Rees, G., 2004.Derivation of numerical values for the World Health Organi-zation guidelines for recreational waters. Water Res. 38,1296–1304.

Noble, R.T., Moore, D.F., Leecaster, M.K., McGee, C.D.,Welsberg, S.B., 2003. Comparison of total coliform, fecalcoliform, and enterococcus bacterial indicator response forocean recreational water quality testing. Water Res. 37,1637–1643.

Sams, E., Calderon, R., Wade, T., Beach, M., Brenner, K., Williams,A., Dufour, A., 2004. GIS analysis for epidemiologic recrea-tional water studies. Epidemiology 15, S215–S216.

Whitman, R.L., Nevers, M.B., 2004. Escherichia coli samplingreliability at a frequently closed Chicago beach: monitoringand management implications. Environ. Sci. Technol. 38,4241–4246.

WHO, 1972. Health Criteria for the Quality of Recreational Waterswith Special Reference to Coastal Waters and Beaches, 13–17thMarch. WHO Regional Office for Europe, Copenhagen Ostend,Belgium.

WHO, 2003. Guidelines for Safe Recreational Water EnvironmentsVolume 1: Coastal and Freshwaters. World Health Organisa-tion Geneva, Switzerland.

Wyer, M.D., Kay, D., Fleisher, J., Jackson, G., Fewtrell, L., 1999. Anexperimental health-based standard system for marinewaters. Water Res. 33, 715–722.

Wymer, L.J., Dufour, A.P., Caldron, R.L., Wade, T.J., Beach, M., 2005.Comment on Derivation of numerical values for the WorldHealth Organization guidelines for recreational waters. WaterRes. 39, 2774–2777.

Documents

Reply to comments on “Derivation of numerical values for the World Health Organization guidelines for recreational waters”