15
Water-Quality in Everglades National Park: Statistical Methods for Censored Data Joffre Castro July, 2008 SFNRC NPS

Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Water-Quality inEverglades National Park:

Statistical Methods forCensored Data

Joffre CastroJuly, 2008

SFNRCNPS

Page 2: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Data below a limitLimits:

Single value, rangeTypes:

Practical QuantitationMethod Reporting

Method DetectionReliable Detection

Assigned by laboratoryQualifies the data“noise” in analytical process

CENSORED DATA

Page 3: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Censored Data

Right-censored data: survival analysis

Left-censored data: water-quality analysis

0 20 40 60 80 100

1

3

5

7

9

11

13

15

OB

SER

VATI

ON

S

Survival Analysis

UNIT OF TIME

END

OF

EXPE

RIM

ENT

0 20 40 60 80 100

1

3

5

7

9

11

13

15

OB

SER

VATI

ON

S

Water Quality Analysis

ConcentrationMDL

0 20 40 60 80 100

1

3

5

7

9

11

13

15

OB

SER

VATI

ON

SSurvival Analysis

UNIT OF TIME

END

OF

EXPE

RIM

ENT

0 20 40 60 80 100

1

3

5

7

9

11

13

15O

BSE

RVA

TIO

NS

Water Quality Analysis

ConcentrationMDL

Page 4: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Everglades National Park

0.01

0.1

1

10

100

1000

1985 1990 1995 2000 2005 2010

Sulfa

te, m

g/L

5 mg/L2 mg/L

1 mg/L 0.1 mg/L

MULTIPLE MDLs

Page 5: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

EP NE1 NP201 P33 P34 P35 P36 P37 TSB TotalAmmonia-N 33 16 22 19 56 40 14 37 48 30Arsenic 79 40 52 66 57 58 49 77 91 63Cadmium 89 73 85 87 89 81 80 82 90 84Copper 64 61 64 53 71 58 59 66 71 63Iron 20 1 5 1 1 4 3Kjeldahl-N 7 1 2 1 1 11 9 3Lead 89 81 84 79 84 78 83 85 87 83Nox-N 23 21 17 16 42 10 16 12 26 20Sulfate 28 10 7 73 22 18 64 25 25Total-P 37 9 25 17 23 11 11 46 28 22TSS 75 57 73 65 76 66 60 72 77 68Zinc 85 88 89 83 88 78 90 87 88 86Black numbers are less than 50 percent nondetects Blue numbers are between 50 and 80 percent nondetects

Red numbers are greater than 80 percent nondetects

PERCENT OF MDLs

No- MDLsAlkalinityCalciumChlorideColorDOHardnessMagnesiumpHPotassiumSodiumSpCTurbidity

Everglades National Park WQ Data

Page 6: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Substitution (arbitrary)

Robust MethodsParametric methods

(assumes a distribution)

Non-parametric methods(distribution free)

METHODS

Page 7: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

METHODS (continued)

ZeroMDL/SQRT(2)½- MDL0.75-MDL1-MDL

KM (Kaplan-Meier)1

ROS (regression on order statistics)MR (multiple-limit regression) 1

MLE (maximum likelihood estimate)AMLE (adjusted maximum likelihood

estimate)

SimpleSubstitutions

RecommendedMethods

1 none parametric

Page 8: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Simple Substitution

STAID DATES ArsenicP34 6/9/1986 1.40P34 7/25/1986 1.30P34 8/18/1986 -0.80P34 9/15/1986 1.40P34 10/27/1986 1.20P34 11/19/1986 -0.80P34 12/30/1986 1.60P34 1/26/1987 2.03P34 2/9/1987 1.36P34 3/9/1987 -0.80

ZE- 0

½-MDL 0.4

1-MDL 0.8

Values below MDL

Page 9: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Oahu Arsenic Data--from Helsel (2005)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

MEAN STDEV Q1 MEDIAN Q3

Con

cent

ratio

n, u

g/L

ZE 1/2-MDL 1-MDL MLE R-ROS K-M

24 values54% MDLs3-MDL levels

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

MEAN

Con

cent

ratio

n, u

g/L

ZE

1/2-MDL

1-MDL

MLE

RR0S

KM

Comparison of Results

Page 10: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

-20

0

20

40

60

80

100

ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE

Perc

ent D

evia

tion

Mean STDEV MEDIAN Q1 Q3

123

Synthetic Ammonia Data, 20% Censored

-20

0

20

40

60

80

100

ZE 1/2-MDL 1-MDL K-M ROS MLE M

Perc

ent D

evia

tion

123

Comparison of Results (continued)

Page 11: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Synthetic Data- 50 observations- 33 (66%) below MDLs- Two MDL levels

From Helsel, (2006,Chemosphere 65, 2434–2439)

Helsel, 2006

Simple Substitution Results

Page 12: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Simple Substitution (continued)

Page 13: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Recommendations

Percent Censored >50 Observations

<50% nondetects Kaplan-Meier

50-80% nondetects Maximum Likelihood>80% nondetects May report high

sample percentile 90th and 95th

RECOMMENDED METHODS FOR SUMMARY STATISTICS

Helsel, 2005

Page 14: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Summary statistics Descriptive K-M, MLE, ROSt-test Censored regression, 0/1 indicatorWilcoxon rank-sum Generalized Wilcoxon testPaired t-test Censored CI on differencessign/signed rank test PPW or Akritas testsANOVA Censored regression, 0/1 indicatorKruskal-Wallis test Generalized Wilcoxon testPearson's r Likelihood rKendall's tau Kendall's tau bRegression Censored regression

Logistic regressionRobust regression Proportional hazards (Cox) regressionTheil-Sen median line Akritas-Theil-Sen median line

From Helsel, 2005

Linear regression

UNCENSORED

Comparing two groups

Comparing three or more groups

Correlation

CENSOREDSTATISTICAL METHODS

Statistical Tests

Page 15: Water-Quality in Everglades National Park: Statistical ... · ZE 1/2-MDL 1-MDL K-M ROS MLE MR AMLE Percent Deviation Mean STDEV MEDIAN Q1 Q3 123 Synthetic Ammonia Data, 20% Censored-20

Conclusions

Substitution methods:arbitrary and inaccurate

Robust methods:Statistically sound

Parametric: MLE, AMLE, ROSNon-parametric: KM, MR, bootstrapping

Available in most stats. packagesSuitable for most tests (trend analysis)Easy to use