Statistics in Risk Assessment - RWTH Aachen University · Statistics in Risk Assessment Background zConduct of Biotesting zProspective assessment of chemicals prior to marketing zRetrospective

Statistics in Risk AssessmentWhat‘s about Truth?

Hans Toni Ratte

Institute of Environmental Research (Biology V)Chair of Ecology, Ecotoxicology, EcochemistryWorkgroup of Aquatic Ecology and EcotoxicologyRWTH Aachen University

0%

20%

40%

60%

80%

100%

0 0.5 1 1.5 2 2.5 3

Log (Dose/concentration)

Effe

ct

2AQUABASE Workshop 29.11.2006

Stat

istic

s in

Ris

k As

sess

men

tContent

IntroductionStatistical toxicity parametersSmall Statistical Crash Course

ECx and LOEC/NOEC conceptMinimal detectable differenceβ-error and statistical power

Test Results from ExamplesNOEC versus ECxLessons learned?Conclusions


Stat

istic

s in

Ris

k As

sess

men

tBackground

Conduct of Biotesting Prospective assessment of chemicals prior to marketingRetrospective effects assessment of environmental samples (fieldmonitoring)

Legal requirementsNational actsGermany: pesticides act, chemicals act, waste water actEU member states

Council Directive 91/414/EECREACh (new in 2007(?))

Competent authorities Responsible for execution of these lawsDecide on authorization of substances based on biotest results (determination the PNEC)

Hence biotest results must even endure before court

Intro

duct

ion


Stat

istic

s in

Ris

k As

sess

men

t

• Basic Level• Acute toxicity in Daphnia magna (24 – 48 h)• Acute toxicity in fish (48 – 96 h)

• Tier I• Growth inhibition test with green algae (72 h)• Chronic toxicity in Daphnia magna (21 d)• Chronic toxicity in fish (Danio rerio) (14 - 21 d)• Terrestrial Plants, Growth test and Vegetative Vigour test (14 d)• Earthworm toxicity test (Eisenia fetida): lethal effects, 14 d

• Tier II• …

• Higher-Tier• …

Biotesting – Tiered Approach In

trodu

ctio

n


Stat

istic

s in

Ris

k As

sess

men

tRequirements of Regulatory Authorities

Test conductOECD guidelines or ISO standardsGood-laboratory practice (GLP)

Statistics ? Some recommendations within guidelinesISO/TS 20281:2004 Water quality — Guidance on statistical interpretation of ecotoxicity data (also corresponding OECD text)However: Recommendations often weak and recommended methods not obligatory

Intro

duct

ion


Stat

istic

s in

Ris

k As

sess

men

tAim of Presentation

Explaining youthe concepts of hypothesis testing (NOEC determination) and concentration/response modeling (ECx estimation from curve fitting);the concept of the minimal detectable difference (MDD) between two samples as a simple indicator of test power;the weakness of the NOEC concept (too much freedom for manipulation).

Making you aware of the final end of „intelligent testing“; i.e. the consequences of weak recommendations and their consequencesMotivating you to advocate for science-based regulatory actions

Intro

duct

ion


Stat

istic

s in

Ris

k As

sess

men

tExample from Guideline OECD 202:2004

Daphnia sp., Acute Immobilization Test

…“The percentages immobilized at 24 hours and 48 hours are plotted against test concentrations. Data are analysed by appropriate statistical methods (e.g. probit analysis, etc.) to calculate the slopes of the curves and the EC50 with 95% confidence limits (p = 0.95)”…

This description is adequate and the mentioned probit analysis is frequently performed with this test (sometimes also replaced by logit or Weibull analysis)

Intro

duct

ion


Stat

istic

s in

Ris

k As

sess

men

tDetermination of the EC50

DataFunction95%-CL

Concentration [mg/L]1

% M

ort

ality

100

90

80

70

60

50

40

30

20

10

0EC50: 2.0 mg/L95%-confidence limits: 1.5 – 2.5 mg/L

Immobility in an acuteDaphnia test, OECD 202

Sta

tistic

al to

xici

ty p

aram

eter

s

Concentration/response curve/functionobtained by fitting

Function used to computeEC50 and 95%-confidenceMethod: Probit analysis(=regression using thelinearized normal sigmoidfunction)


Stat

istic

s in

Ris

k As

sess

men

tExample from Guideline OECD 211:1998

Daphnia magna Reproduction Test …“the number of deaths among the parent animals and the day on which they occurred (see …);”……”the Lowest Observed Effect Concentration (LOEC) for reproduction, including a description of the statistical procedures used and an indication of what size of effect could be detected and the No Observed Effect Concentration (NOEC) for reproduction; where appropriate, the LOEC/NOEC for mortality of the parent animals should also be reported;where appropriate, the ECx for reproduction and confidence intervals and a graph of the fitted model used for its calculation, the slope of the dose-response curve and its standard error;”S

tatis

tical

toxi

city

par

amet

ers


Stat

istic

s in

Ris

k As

sess

men

t

0%

20%

40%

60%

80%

100%

0 0.5 1 1.5 2 2.5 3

Log (Dose/concentration)

Effe

ct

NOEC, LOEC and ECx

NOEC

LOEC

EC50

EC20

Sta

tistic

al to

xici

ty p

aram

eter

s


Stat

istic

s in

Ris

k As

sess

men

tLOEC and NOEC (from OECD 211)

“Lowest Observed Effect Concentration (LOEC) is the lowest tested concentration at which the substance is observed to have a statistically significant effect on reproduction and parent mortality (at p < 0.05) when compared with the control, within a stated exposure period.”…

“No Observed Effect Concentration (NOEC) is the test concentration immediately below the LOEC, which when compared with the control, has no statistically significant effect (p < 0.05), within a stated exposure period.”S

tatis

tical

toxi

city

par

amet

ers


Stat

istic

s in

Ris

k As

sess

men

tStatistical Procedures (OECD 211)

“The mean for each concentration must then be compared with the control mean using an appropriate multiple comparison method. Dunnett’s or Williams’ tests may be useful (…). It is necessary to check whether the ANOVA assumption of homogeneity of variance holds.”…

Relatively weak conditions- Selection of tests ?- Statistical test direction ?- ECx: value of x ?

What are the consequences?

Sta

tistic

al to

xici

ty p

aram

eter

s


Stat

istic

s in

Ris

k As

sess

men

tToxicity Parameters and Data Scale

Quantal/qualitative Responses Biological variable with nominal scaleExample: Mortality (a number of dead animal out of a number of introduced ones after a certain intervalPoint-estimate from response curve: LC50 or EC50Immobilization

Metric/quantitative ResponsesBiological variable with metric scaleExample: Biomass yield, growth rate, offspringPoint-estimate from response curve: ECx;where x: 10, 20, 50%,…(no fixing of x) Toxic threshold: LOEC/NOEC (nearly always required)

Statistical test methods and curve-fitting procedures are different in these two scales!

Sm

all S

tatis

tical

Cra

sh C

ours

e


Stat

istic

s in

Ris

k As

sess

men

tLOEC/NOEC Concept

Determined by hypothesis testing (statistical test)

The difference between a treatment and the control that a statistical test is able “to see”, can be smaller or greater depending on the variable’s variance and the replication of test units

What’s about the minimum difference that can be detected by a statistical test?S

mal

l Sta

tistic

al C

rash

Cou

rse


Stat

istic

s in

Ris

k As

sess

men

tMinimal Detectable Difference, MDD

t

t

c

ctc n

snstMDDxx ²²)*( * +==−

100*%cx

MDDMDD =

t

t

c

c

tc

ns

ns

xxt²²

+

−=

Starting point: t-formula:t is the standardized difference between control (c ) and treatment (t)

t

t

c

c

tc

ns

ns

xxt²²*)(*

+

−=with the tabulated t* being the critical

margin (e.g., at α = 0.05) and inserted into the formula, the MDD is easily obtainedafter rearranging:

and expressed as relative difference to the control:

Example: t-testS

mal

l Sta

tistic

al C

rash

Cou

rse


Stat

istic

s in

Ris

k As

sess

men

tInfluence of Variance and Replication on %MDD

47.058.272.9n. d.50

37.646.658.390.740

28.234.943.868.030

18.823.329.245.320

14.117.521.934.015

9.411.614.622.710

4.75.87.311.35

10753%

Coe

ffici

ento

f Var

iatio

n

Number of ReplicatesS

mal

l Sta

tistic

al C

rash

Cou

rse


Stat

istic

s in

Ris

k As

sess

men

tInfluence of Test Direction on the MDD

n. d.72.950

n. d.58.340

94.443.830

63.229.220

47.421.915

31.614.610

15.87.35

Two-sidedOne-sided

Test direction

n = 5

Sm

all S

tatis

tical

Cra

sh C

ours

eStatements on test direction are very rare

%C

oeffi

cien

tof V

aria

tion


Stat

istic

s in

Ris

k As

sess

men

tCV and MDD in Selected Biotests

Conclusion:Laboratory biotests show CVs and MDDsbetween 5 and 40%The NOEC can be smaller than the EC10 or as high as the EC50

Biotest Variable CV% n %MDD NOEC EC10 EC20 EC50Algae Growth Inhibition, OECD 201 Growth rate 7.2 6 9.2 31.6 33.1 36.2 42.9Terrestrial Plant, OECD 208 Shoot Dry Weight 36.2 8 39.0 8.0 3.4 6.0 17.7

Emergence Rate 10.8 8 11.5 2.0 2.8 4.8 12.8Daphnia reproduction, OECD 211 Offspring 22.5 10 23.7 2025.0 246.9 542.3 2422.8Fish, Juvenile Growth, OECD 215 Weight 4.7 16 5.7 0.0 0.0 0.1 0.7Chironomid, OECD 218 Emergence Rate 17.7 4 30.5 12.5 3.4 6.7 25.0Lemna Growth Inhibition,OECD 221 Yield 8.9 4 8.8 32.0 27.6 152.9 3985.8Earthworm Reprod., OECD 222 Offspring 19.7 6 14.5 <0.5 0.6 0.7 1.0


Stat

istic

s in

Ris

k As

sess

men

tHigh MDDs are Dangerous

The MDD grows with increasing variance and decreasing replication

Differences smaller than the MDD favor theNull-hypothesis: H0 (µcontrol = µtreatment)

But high risk of type-II error (β-error)Wrong H0 accepted

Not favorable for the environment and not in line with the precautionary principle

Sm

all S

tatis

tical

Cra

sh C

ours

e


Stat

istic

s in

Ris

k As

sess

men

tReality and Theory - Statistical Errors

Test power is judged on the basis of the β-error

Sm

all S

tatis

tical

Cra

sh C

ours

e

Type-II Errorβ-Error

Decisionaccepted!

Error !!CorrectHo

DecisionType-I Errorα-Error

rejected!

CorrectError !!

Ho

Ho wrongHo true(StatisticalDecision)

RealityTheory


Stat

istic

s in

Ris

k As

sess

men

t µo = 20.08 µ1 = 21.08

α = 0.05β = 0.07

H0: µ0 = µ1 true?

H0: µ0 < µ1 true?

MDD

Sm

all S

tatis

tical

Cra

sh C

ours

e


Stat

istic

s in

Ris

k As

sess

men

t

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

20.8 21 21.2 21.4 21.6 21.8 22 22.2 22.4 22.6 22.8

µ1 [mm]

ß-er

ror

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Pow

er

To guarantee a power of 80% the difference between µ0 and µ1 must be at least 0.8 mm in the current example

Power function (Power = 1 - β)

µ0

Sm

all S

tatis

tical

Cra

sh C

ours

e


Stat

istic

s in

Ris

k As

sess

men

tWhich Test Gives us Power?

LOEC/NOECs determined using multiple tests Multiple tests ensure that the experiment-wise error probability (type-I error) is equal or lower the selected significance level α (e.g. 0.05)Current guidelines offer a selection of multiple tests:

Dunnett‘s test (multiple t test; most widely recommended)Williams‘ test (multiple sequential t test)Pair-wise Mann-Whitney U test with Bonferroni adjustment of the significance level…

Test

Res

ults

from

Exa

mpl

es


Stat

istic

s in

Ris

k As

sess

men

tExample Data Set and Dunnett‘s test

Dunnett`s Multiple t-test ProcedureTab. 4: Comparison of treatments with "Control" by the t test procedure after Dunnett. Significance was Alpha = 0.05, one-

sided smaller (multiple level); Mean: arithmetic mean; n: sample size; s: standard deviation; %MDD: minimum detectable difference to Control (in percent of Control); t: sample t; t*: critical t for Ho: µ1 = µ2 = ... = µk; the differences are significant in case |t| > |t*| (The residual variance of an ANOVA was applied; df = N - k; N: sum of treatment replicates n(i); k: number of treatments).

Treatm. [µg/L] Mean s df %MDD t t* Sign. Control 50.1 4.72 1.20 52.7 4.72 59 -8.8 1.40 -2.38 - 2.40 45.2 4.72 59 -8.8 -2.54 -2.38 + 4.80 47.1 4.72 59 -8.8 -1.61 -2.38 - 9.60 44.8 4.72 59 -8.8 -2.85 -2.38 + 19.20 46.6 4.72 59 -8.8 -1.74 -2.38 - +: significant; -: non-significant

The NOEC appears to be higher than 19.20 µg/L.

Test

Res

ults

from

Exa

mpl

es


Stat

istic

s in

Ris

k As

sess

men

tWhere is the NOEC?

Conclusion:Williams test most powerful (lower NOECs)Bonferroni-U test least powerful (NOEC higher, but not possible to determine here)Dunnett‘s test leads to ambiguous results (not able to determine unequivocal NOECs here)Two-sided testing results sometimes in higher NOECs

Concentration [µg/L]Statist. Test 1.2 2.4 4.8 9.6 19.2 LOEC NOEC

Dunnett; one-sided - + - + - >19.2 ?19.2Dunnett; two-sided - - - + - >19.2 ?19.2

Williams; one-sided - + + + + 2.4 1.2Williams; two-sided - - + + + 4.8 2.4

Bonferroni-U-test; one-sided - - - + - >19.2 ?19.2Bonferroni-U-test; two-sided - - - + - >19.2 ?19.2

Test

Res

ults

from

Exa

mpl

es


Stat

istic

s in

Ris

k As

sess

men

tNew Findings ?

NoOECD (1998) - Report on the OECD workshop on statistical analysis of aquatic toxicity data. “It was concluded that the NOEC, as the main summary parameter of aquatic ecotoxicity tests, is inappropriate for a number of reasons (…) and should therefore be phased out. It was recommended that the OECD should move towards a regression-based estimation proce-dure .“ …” A steering group should be set up to direct the mathematical, statistical and biological work required to take the workshop recommendations forward. This group should include representatives from the appropriate scientific and regulatory communities.”

NO

EC

ver

sus

EC

x


Stat

istic

s in

Ris

k As

sess

men

tOECD (1998) Against NOEC

The NOEC must be one of the test concentrations.No precision statements are possible for the NOEC.NOECs may correspond to large effects on test organisms.The NOEC will not be obtainable in all cases.

“The above points indicate that the NOEC is far from ideal as a summary measure of toxic effect. It is too heavily dependent on the experimental design and the variability in the data. Consequently the NOEC may correspond to large effects, possibly of biological significance. Its value in hazard assessment is questionable.”

Pro NOEC: Simple calculation and use

NO

EC

ver

sus

EC

x


Stat

istic

s in

Ris

k As

sess

men

tOECD (1998) pro ECx

The ECx is not restricted to be one of the test concentrations.The precision of the ECx can be quantified.ECx values are comparable.The whole of the toxic response of the organism may be characterized.Regression modeling is flexible.Replication is not a crucial issue.A greater concentration range can be studied.N

OE

C v

ersu

s E

Cx


Stat

istic

s in

Ris

k As

sess

men

tOECD (1998) on ECx Problems

The difficulty in choosing a model.For extreme percentiles confidence intervals may be very wide.ECx estimation is generally computationally more difficult than NOEC estimation.ECx estimates may be difficult to obtain in some cases.

E.g. when low concentrations give 0% response and high concentrations give 100% response with no intermediate responses at any concentration.

Using ECxs in place of NOECs requires the value of x to be specified.Use and understanding of precision and confidence intervals must be increased.

NO

EC

ver

sus

EC

x


Stat

istic

s in

Ris

k As

sess

men

tNew Update OECD 201:2006 (Algae)

“For estimation of the LOEC and hence the NOEC,”…” it is necessary to compare treatment means using analysis of variance (ANOVA) techniques.”…”The mean for each concentration must then be compared with the control mean using an appropriate multiple comparison or trend test method. Dunnett’s or Williams’ test may be useful (…). It is necessary to assess whether the ANOVA assumption of homogeneity of variance holds.”“Recent scientific developments have led to a recommendation of abandoning the concept of NOEC and replacing it with regression based point estimates ECx. An appropriate value for x has not been established for this algal test. A range of 10 to 20 % appears to be appropriate (depending on the response variable chosen), and preferably both the EC10 and EC20 should be reported.”

„Competent“ authorities don‘t like „both the“…

Less

ons

lear

ned?


Stat

istic

s in

Ris

k As

sess

men

tConclusions

Clear insights that the NOEC concept is problematicDiscrepancy between scientific insights and regulatory practicesRegulatory „needs“ ask for simple solutions in spite of their shortcomings and risksThis appears in contradiction to the precautionary principleThere is need that science and the precautionary principle rather than convenience governs the regulatory practice

Con

clus

ions

Documents

Statistics in Risk Assessment - RWTH Aachen University · Statistics in Risk Assessment Background zConduct of Biotesting zProspective assessment of chemicals prior to marketing zRetrospective