31
Steps 3 & 4: Analysis & interpretation of evidence

Steps 3 & 4: Analysis & interpretation of evidence

Embed Size (px)

Citation preview

Page 1: Steps 3 & 4: Analysis & interpretation of evidence

Steps 3 & 4: Analysis & interpretation of evidence

Page 2: Steps 3 & 4: Analysis & interpretation of evidence

2

Hypothesis testing: a simple scenario

SCENARIO 1• DO measured upstream &

downstream over 9 months— Upstream = 9.3 mg/L— Downstream = 8.4 mg/L

• Difference significant at P<0.05

SCENARIO 2• DO measured upstream &

downstream over 3 months— Upstream = 7.9 mg/L— Downstream = 4.2 mg/L

• Difference not significant at P<0.05

Which scenario presents a stronger case for DO causing impairment?

Page 3: Steps 3 & 4: Analysis & interpretation of evidence

3

Using statistics responsibly

Use caution in interpreting differences

• Look at magnitude & consistency of differences, rather than statistical significance

• Statistical significance detects differences exceeding natural variance

— Does not detect stressor effects— Does not equal biological significance

• Can use statistics, but also use your head— Consider relationship between minimum detectable

difference (power) & biologically relevant difference

Page 4: Steps 3 & 4: Analysis & interpretation of evidence

4

That said, graphs & statistics can help a lot…

STEP 2List candidate

causes

STEP 3Evaluate data

from case

STEP 4Evaluate data from elsewhere

Scatterplots Boxplots Correlation analysis Conditional probability analysis Regression analysis Predicting environmental condition from biological observations (PECBO) Quantile regression Classification & regression tree Species sensitivity distributions (SSDs)

Page 5: Steps 3 & 4: Analysis & interpretation of evidence

5

Page 6: Steps 3 & 4: Analysis & interpretation of evidence

6

Page 7: Steps 3 & 4: Analysis & interpretation of evidence

7

Page 8: Steps 3 & 4: Analysis & interpretation of evidence

8

Page 9: Steps 3 & 4: Analysis & interpretation of evidence

9

Sampling/data collection issues

• Want to collect candidate cause & biological response variables at same sites, at same times

• Want to time sampling to accurately assess exposures— For continuous exposures, try to catch sensitive life

stages & conditions that enhance exposure or effects— For episodic exposures, try to characterize episodes

through continuous, event, and post-event sampling

• Want to collect good reference comparisons, in space & time

— At unimpaired sites, at same times you sample impaired sites

Page 10: Steps 3 & 4: Analysis & interpretation of evidence

10

Page 11: Steps 3 & 4: Analysis & interpretation of evidence

11

Data analysis tools• Species sensitivity distribution (SSD) generator

— Tool that calculates & plots proportion of species affected at different levels of exposure in laboratory toxicity tests

Page 12: Steps 3 & 4: Analysis & interpretation of evidence

12

Data analysis tools

• Species sensitivity distribution (SSD) generator— Tool that calculates & plots proportion of species affected at

different levels of exposure in laboratory toxicity tests

• Command-line R scripts— Powerful and free statistical package [http://www.r-project.org/]

— R scripts provided for PECBO (predicting environmental conditions from biological observations)

• CADStat— Menu-driven package of data visualization & statistical

techniques, based on JGR (Java GUI for R)

Page 13: Steps 3 & 4: Analysis & interpretation of evidence

13

RR is a powerful and free statistical package…

[http://www.r-project.org/]

…but it’s run at the command line, so there’s a steep learning curve

Page 14: Steps 3 & 4: Analysis & interpretation of evidence

14

JGR/CADStat: a gentler introduction to R

• JGR provides point-and-click tools for basic R functions

— Loading data— Setting up a workspace — Managing data in your workspace— Installing packages

• CADStat adds several tools to JGR— Point-and-click interfaces for tools useful

in causal analysis — Rendering of analysis results in browser

window— Additional help files

– Scatterplots

– Boxplots

– Correlation analysis

– Linear regression

– Quantile regression

– Conditional probability analysis

– PECBO

Page 15: Steps 3 & 4: Analysis & interpretation of evidence

15

Load your data into CADStat, or use example data…

Page 16: Steps 3 & 4: Analysis & interpretation of evidence

16

Page 17: Steps 3 & 4: Analysis & interpretation of evidence

17

Where are we?

• Step 1 — Biological effects of interest defined — Impaired & reference sites identified

• Step 2 — Map completed — Conceptual model developed— Candidate causes identified— Data identified & collected

Now we’re ready to analyze available data & use results as evidence in Steps 3 & 4

Page 18: Steps 3 & 4: Analysis & interpretation of evidence

18

Begin with data exploration

• Want to examine relationships between different variables

— Linear & more complex — Expected & unexpected

• 1st step is visualizing data— Scatterplots & boxplots — No statistics, hypotheses, or p-values

Page 19: Steps 3 & 4: Analysis & interpretation of evidence

19

Looking at your data…examples from exercise

• Which candidate cause do the data relate to?• Do the data support or weaken the case for the candidate cause?• What type of evidence do the data represent?

PC1 – reference PC2 – impaired

Dissolved oxygen (mg/L) 7.3 7.9

Canopy cover moderate low

In Fall 2004, an unpermitted industrial discharge with high metal concentrations was discovered and removed, just upstream of PC2.

EPT taxa richness over time

2004 2005 2006 2007

PC1 18 16 15 16

PC2 8 9 13 15

Page 20: Steps 3 & 4: Analysis & interpretation of evidence

20

Boxplots• Depict distribution of

observations – Center box indicates spread of 50%

of observations– Whiskers indicate either max & min

values, or 1.5X the interquartile range

• Good data exploration & visualization tool

• Can be used to evaluate data from case & data from elsewhere

– Spatial/temporal co-occurrence– Stressor-response relationships

from the field or from other field studies

– Causal pathway

Page 21: Steps 3 & 4: Analysis & interpretation of evidence

21

Scatterplots• Graphical displays of matched data

– Dependent variable (y-axis) vs. independent variable (x-axis)

– Often dependent variable is biological response, independent variable is measure of candidate cause

ABC

DE

F

GH

I

200 400 600 800 1000

68

1012

14

EP

T R

ichn

ess A

B

C

D

E

F

GH

I

200 400 600 800 1000

510

2030

SpecConductivity

% N

on-in

sect

s

ABC

D

EF

G

HI

200 400 600 800 1000

3.5

4.5

5.5

6.5

HB

I

Conductivity (uS/cm)

• Good data exploration & visualization tool

– Can view several at once in scatterplot matrix

• Can be used to evaluate data from case & from elsewhere

– Stressor-response relationships from the field, from other field studies, & from laboratory studies

– Causal pathway

Page 22: Steps 3 & 4: Analysis & interpretation of evidence

22

Examples from exercise: scatterplots

• Do the data support or weaken the case for temperature as a candidate cause?

• What type(s) of evidence do these data provide?

0

5

10

15

20

15 20 25 30

Maximum summer water temperature (˚C)

EPT

taxa

rich

ness

PC1

PC2

Page 23: Steps 3 & 4: Analysis & interpretation of evidence

23

Examples from exercise: scatterplots

• Do the data support or weaken the cases for copper & zinc as candidate causes?

• What type(s) of evidence do these data provide?

0

5

10

15

20

0 1 2 3

0

5

10

15

20

0 1 2 3

Log [Zn], ug/L

EPT

taxa

rich

ness

Log [Cu], ug/L

Page 24: Steps 3 & 4: Analysis & interpretation of evidence

24

• Method for measuring degree of association between 2 variables in a matched data set

Elevation Water Temp

Air Temp

Log Conductivity

Elevation 1 -0.75 -0.25 -0.02

Water Temp -0.75 1 0.64 0.07

Air Temp -0.25 0.64 1 0.39

Log Conductivity -0.02 0.07 0.39 1

Correlation analysis

– Correlation coefficient provides quantitative measure of degree to which 2 variables co-vary

• Good data exploration tool

• Can be used to evaluate data from case– Stressor-response relationships from the field– Causal pathway

Page 25: Steps 3 & 4: Analysis & interpretation of evidence

25

• Method for quantifying relationship between dependent & independent variable

– Can have ≥ 1 independent variables

Regression analysis

• Models can be used to:– Predict value of response

variable for new values of explanatory variable

– Estimate value of explanatory variable needed to account for change in response variable

– Model natural variation in stressors, to define reference expectations

90th percentile prediction limits

Page 26: Steps 3 & 4: Analysis & interpretation of evidence

26

Quantile regression

• Models relationship between specified quantile of response variable and explanatory variable (stressor)

– 50th quantile gives median line, under which 50% of observed responses occur

– 90th quantile gives line under which 90% of observed responses occur

• Provides means of estimating location of upper boundary on scatterplot

• Used to evaluate stressor-response relationships from other field studies (Step 4: Evaluating data from elsewhere)

Page 27: Steps 3 & 4: Analysis & interpretation of evidence

27

Applying quantile regression

SUPPORTS WEAKENS

Page 28: Steps 3 & 4: Analysis & interpretation of evidence

28

• Uses taxa list & niche characteristics to predict environmental conditions at site

Predicting environmental conditions from biological observations (PECBO)

Ameletus

Diphetor

Predicted temperature if both Ameletus & Diphetor are present at site• Maximum likelihood

inference based on taxon-environment relationships

• Useful when environmental data not available

Page 29: Steps 3 & 4: Analysis & interpretation of evidence

29

Using PECBO to verify predictions

R2 = 0.67

• Biologically-based predictions can provide valuable supporting evidence, under “Verified predictions”

— If stressor elevated at site, can test hypothesis that biologically-based prediction should also be elevated

• Tools for calculating taxon-environmental relationships & PECBO available on CADDIS

Page 30: Steps 3 & 4: Analysis & interpretation of evidence

30

Species sensitivity distributions (SSDs)

An SSD is a statistical distribution describing the variation of responses of a set of species to a stressor; it can be used to estimate the proportion of species adversely affected at a given stressor intensity.

SSD plot showing LC50s for arthropods exposed to dissolved cadmium

Page 31: Steps 3 & 4: Analysis & interpretation of evidence

31

Now where are we?

• Step 1 — Biological effects of interest defined — Impaired & reference sites identified

Now we’re ready to evaluate & score the evidence across candidate causes

• Step 2 — Map completed — Conceptual model developed— Candidate causes identified— Data identified & collected

• Steps 3 & 4 — Available data examined &

analyzed— Available data organized into

types of evidence