42
Comments of the General Electric Company on EPA's Final Human Health Risk Assessment Work Plan for Lower Housatonic River Prepared by Ogden Environmental and Energy Services June 12, 2000 This document presents the comments of the General Electric Company (GE), prepared on GE's behalf by Ogden Environmental and Energy Services, on the U.S. Environmental Protection Agency's (EPA's) final Human Health Risk Assessment Work Plan (Work Plan), which is Chapter 6 of EPA's final Supplemental Investigation Work Plan for the Lower Housatonic River (Weston, 2000a). That Work Plan presents an iterative two-phase approach to evaluating potential human health risks for the Lower River (designated as the "Rest of the River" in the October 7, 1999 Consent Decree). Phase 1 involves the screening of properties, based on their current uses and the available analytical data for PCBs, to determine whether there is a need for additional sampling and whether properties should be retained for more detailed risk assessment. Phase 2 involves a complex forward risk assessment approach for each property that is retained after completion of the Phase 1 screening. Based on recent discussions with EPA, GE understands that EPA is currently in the process of conducting the Phase 1 screening portion of its human health risk assessment (HHRA) for the Lower Housatonic River. As a follow-up to those discussions, these comments provide GE's 1/1 position on a number of assumptions underlying EPA's Phase 1 screening levels, and also address 1 EPA's proposed methodologies for Phase 2, as outlined in the Work Plan. GE previously o provided comments on a draft of the Work Plan on June 3, 1999. While some changes have been made to the methodologies between that earlier draft and the final Work Plan, a number of the 8 NJ approaches outlined in the final Work Plan are the same as in the prior version. GE preserves its position on all points set forth in its June 3, 1999 comments. However, to avoid needless repetition, the present comments focus only on a few key issues in the final Work Plan, including certain new issues that are raised by the final Work Plan. 01=

Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

Comments of the General Electric Company on EPAs Final Human Health Risk Assessment Work Plan

for Lower Housatonic River

Prepared by Ogden Environmental and Energy Services June 12 2000

This document presents the comments of the General Electric Company (GE) prepared on GEs

behalf by Ogden Environmental and Energy Services on the US Environmental Protection

Agencys (EPAs) final Human Health Risk Assessment Work Plan (Work Plan) which is Chapter

6 of EPAs final Supplemental Investigation Work Plan for the Lower Housatonic River (Weston

2000a) That Work Plan presents an iterative two-phase approach to evaluating potential human

health risks for the Lower River (designated as the Rest of the River in the October 7 1999

Consent Decree) Phase 1 involves the screening of properties based on their current uses and

the available analytical data for PCBs to determine whether there is a need for additional sampling

and whether properties should be retained for more detailed risk assessment Phase 2 involves a

complex forward risk assessment approach for each property that is retained after completion of

the Phase 1 screening

Based on recent discussions with EPA GE understands that EPA is currently in the process of

conducting the Phase 1 screening portion of its human health risk assessment (HHRA) for the

Lower Housatonic River As a follow-up to those discussions these comments provide GEs 11

position on a number of assumptions underlying EPAs Phase 1 screening levels and also address 1

EPAs proposed methodologies for Phase 2 as outlined in the Work Plan GE previously o

provided comments on a draft of the Work Plan on June 3 1999 While some changes have been

made to the methodologies between that earlier draft and the final Work Plan a number of the 8 NJ

approaches outlined in the final Work Plan are the same as in the prior version GE preserves its

position on all points set forth in its June 3 1999 comments However to avoid needless

repetition the present comments focus only on a few key issues in the final Work Plan including

certain new issues that are raised by the final Work Plan

01=

- 2 shy

A Comments on Phase 1 - Site Screening Approach

As discussed in the Work Plan Phase 1 of the HHRA involves a screening of properties and other

exposure points based on available PCB data to determine whether they should be retained for

a more detailed risk assessment To accomplish this screening EPA has developed screening risk-

based concentrations (SRBCs) for PCBs in floodplain soil riverbank soil and sediments for

various exposure scenarios - ie residential recreational agricultural utility worker and

commercialindustrial groundskeeper scenarios Separate lowcontact and highcontact SRBCs

have been calculated for the residential and recreational scenarios The initial screening step

involves comparison of the maximum PCB concentrations for each medium and exposure point

(classified in the appropriate scenario) to the most conservative SRBC applicable to that medium

and exposure point If the maximum concentration is below that SRBC the property or other

exposure point will be dropped from further evaluation If the maximum concentration exceeds

the applicable SRBC EPA will if necessary conduct additional sampling and will determine the

95 upper confidence limit (UCL) on the mean of the PCB data for such property or other

exposure point If the 95 UCL does not exceed the applicable SRBC the property or exposure

point will be dropped from further consideration If the 95 UCL exceeds the SRBC the

property or exposure point will be carried through to Phase 2 of the HHRA

GE supports the concept of a phased screening approach However it believes that several of the

assumptions and other input parameters used by EPA to develop the SRBCs for PCBs are overly

conservative and that as a result the SRBCs are likewise excessively stringent even for a

screening-level analysis In this connection GE preserves all prior positions expressed to EPA

regarding the toxicity of PCBs and appropriate exposure parameters However these comments

are focused on a limited number of the parameters used by EPA in deriving the SRBCs Although

EPA is currently in the process of the Phase 1 screening GE believes that EPA should take

account of these comments and revise its SRBCs accordingly for purposes of conducting additional

or more detailed screening

- 3 shy

1 Inconsistent target risk levels

According to the information provided in Tables 62-1 62-5 and 62-6 of the Work Plan EPA

has not used a consistent target risk level for calculating its SRBCs While a cancer risk of 5 x

106 is identified as the target risk level for the residential and recreational SRBCs target risks of

16 x 10~6 and 11 x 106 are identified for the utility worker and commercial groundskeeper

scenarios respectively The Work Plan presents no rationale for these differences in target risk

levels

To provide a more meaningful screening step EPA should pick a reasonable and consistent target

risk level and use it as the basis for calculating all scenario-specific SRBCs This will ensure that

all SRBCs are equally protective of public health The purpose of conducting the Phase 1 screen

is to identify those properties that do not need to be evaluated further in the HHRA and to drop

them from future consideration so that the HHRA can focus on key areas rather than all areas

If the screening SRBCs are overly stringent numerous properties will be retained for the risk

assessment despite the fact that they will likely be shown to have no unacceptable risks once the

risk assessment is completed

To eliminate this problem and increase the efficiency of the screening step GE believes that the

target cancer risk level should be established as 1 x 10~5 for all scenarios This risk level is the

midpoint of the EPAs acceptable risk range (1 x 10^ to 1 x 106) (EPA 1990) When coupled

with conservative exposure parameters and upper-bound media concentrations as are outlined in

the Work Plan this approach will successfully screen out those properties that do not pose a risk

while retaining those that warrant further evaluation

2 Overly conservative exposure assumptions

Several of the exposure parameter values that are used to derive the SRBCs are extremely

conservative and cannot be considered representative of likely activities even for a screening-level

assessment These parameters include some of the exposure frequencies and exposed skin surface

areas used for the residential and recreational scenarios and the soil ingestion rates used for the

recreational and utility worker scenarios

- 4 shy

a Exposure frequencies for residential and recreational scenarios

Some of the exposure frequencies used by EPA to derive the SRBCs for the residential and

recreational scenarios are unrealistic and implausible These include the assumed frequency of

two days per week for seven months per year for low-contact residential exposure to riverbank

soils and sediments and low-contact recreational exposure to floodplain and riverbank soils and

sediments as well as the assumed frequency of three days per week for seven monthsyear for

high-contact residential and recreational exposure to sediments

The Work Plan indicates that the determination of high-contact versus low-contact areas depends

on the accessibility of the area -- eg judgments about the likelihood that the area has physical

features consistent with a child (or adult) contacting riverbank soil on a regular basis (p 6-26)

Physical features that would lead to a low-contact determination would presumably include at a

minimum steep slopes undercut river banks rocky areas with limited or no access to soil or

sediment and marshy areas Such conditions are not consistent with contact with the contaminated

media on a regular basis In these circumstances it is unreasonable to assume an exposure

frequency of two days per week for each week in the seven-month exposure period Such a

frequency clearly represents regular contact which is inconsistent with the limited accessibility

of these areas A more appropriate assumption for low-contact areas would be to assume

occasional contact (eg one day per week or two days per month) during the five warmer months

of the year

For high-contact areas EPA has assumed an exposure frequency of three days per week for seven

months While this is a highly conservative estimate even for contact with floodplain and

riverbank soils it is totally unreasonable for contact with sediments particularly given the

assumed exposed skin surface areas Even for high-contact sediment areas it is extremely

unlikely that children and adults would venture into the river and be exposed to sediments as often

as three days per week during every week from April through October In fact children would

probably not play in the river at all in the cooler portions of this period Thus it would be more

reasonable to assume an average exposure frequency of around one day per week during this

seven-month period

- 5 shy

b Exposed skin surface areas for residential and recreational scenarios

The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface

areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For

these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will

be in contact with soil or sediment every time that the individuals are present in the affected areas

(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin

surface areas during the spring and fall months are likely to be less because individuals will be

wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level

of exposure assumed by EPA may occur during the summer months it would not occur every time

the individual visits the affected area Hence EPA should reduce the exposed skin surface areas

to take account of the lesser areas exposed during the spring and fall and thus develop time-

weighted averages for the exposed skin surface areas

c Soil ingestion rates for recreational scenario

For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion

rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not

agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)

However if these daily soil ingestion rates are to be used EPA should at least apply some factor

to account for the fraction of the daily soil ingestion that is derived from the affected recreational

area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under

real-life conditions total daily soil ingestion will occur over the entire day and will include soil

andor sediments from a variety of different locations some of which will not be contaminated

However EPA has assumed that total daily soil ingestion will come from a single contaminated

area and has not included a factor for the fraction of soil ingested from the site While it is

presumed that this omission is due to the fact that EPA is concerned that some individuals may

occasionally obtain all of their ingested soil from a single contaminated location such an

assumption is not reasonable when coupled with the exposure frequencies that are assumed In

addition some highly accessible areas may be very limited in size as is the case on some

floodplain properties Because of this even if all of the soil is obtained from an exposure area

only a portion of that soil will be contaminated

- 6 shy

Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for

soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil

or sediment ingested daily comes from the site In reality individuals who are engaged in river-

based activities will ingest a combination of soil and sediment during the course of their activity

Given these considerations GE believes that a factor of 50 percent should be used to represent the

fraction of ingested soil or sediment that is derived from contaminated recreational areas during

recreational activities This will correct for the fact that daily ingestion exposure occurs with a

combination of soil and sediment It will also correct for the fact that only limited portions of the

day will be spent in recreational areas In addition such a factor will more accurately reflect

exposures at those areas where the area of contamination is very limited This approach is

consistent with EPAs approach in its development of the cleanup goals under the October 7 1999

Consent Decree for recreational areas outside the River (EPA 1999a)

d Soil ingestion rate for utility worker scenario

EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker

scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion

rate may be appropriate for utility workers who will have direct contact with soil the ingestion

rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of

137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy

contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA

itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for

Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based

on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal

risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility

workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of

08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced

soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion

rate should be used in developing a revised SRBC for the utility worker scenario

- 7 shy

B Comments on Phase 2 - Baseline HHRA Approach

For those properties or other exposure points that remain after the Phase 1 screening the Work

Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a

number of comments on that proposed approach

1 Selection of non-PCB contaminants of potential concern

The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential

concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for

each area that has PCB concentrations above the applicable SRBC the available sampling data on

other constituents will be compared with several criteria to determine the need for additional

sampling in that area These criteria include (a) background concentrations established for the

area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas

which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department

of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts

Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are

exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area

If they are not exceeded the Work Plan states that the baseline HHRA will be performed using

the available analytical data (p 6-31) The latter statement appears to mean that even if the

concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are

below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those

constituents will nevertheless be carried through to the second step in the selection of COPCs

In the second step all constituents detected will be screened for the selection of COPCs by

comparing the maximum detected concentrations with contaminant-specific and medium-specific

risk-based concentrations The Work Plan states that the principal criterion for selection of a

COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the

constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used

it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01

will be used Any constituent for which the maximum concentration exceeds the PRG will be

carried forward for the risk assessment In addition the Work Plan states that constituents which

- 8 shy

are known to bioaccumulate in biological tissues but do not exceed the PROs will be further

evaluated for possible inclusion in the risk assessment

GE has several concerns about this proposed two-step approach

First GE believes that if the concentration of non-PCB constituents in a given area does not

exceed the criteria for further sampling (as listed above) those constituents should be dropped

from further consideration and not carried through to the second step in selecting COPCs As

EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)

which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are

consistent with local background levels or are less than the 1 ppb TEQ concentration for

dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no

significant risk to human health Hence such constituents need not and should not be considered

further in the selection of COPCs

Second for those constituents that are carried forward to the second step in the COPC selection

process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum

concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that

it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy

cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ

of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or

most of which in fact make no significant contribution to actual risks This would undercut the

purpose of the screening approach and will greatly and needlessly complicate the risk assessment

without any corresponding benefit in terms of achieving additional health protection Instead GE

recommends that in selecting the COPCs for the Rest of the River EPA should use an approach

similar to the approach approved by EPA and specified in the SOW for Areas Outside the River

including the non-bank portions of certain floodplain properties This approach utilizes not only

1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria

- 9 shy

the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed

below

For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in

recent guidance from EPA headquarters for use in addressing these constituents in soil at

CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in

residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in

commercialindustrial areas The SOW provides specifically for use of these PRGs in screening

out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these

PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that

these PRGs should be used not only as screening criteria but also as remedial goals which is

consistent with the approach embodied in the SOW for Areas Outside the River See Section B2

below)

For other constituents EPA should follow a step-wise approach that begins with comparisons to

the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for

use in screening such constituents without further reduction they should not be reduced by an

additional 10 times For constituents for which such PRGs do not exist a similar screening step

should be employed using PRGs for surrogate compounds2

EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison

to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB

constituents to site-specific background levels (eg concentrations upstream of the GE facility)

is appropriate to ensure that only those constituents attributable to releases from the GE facility

are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for

Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 2: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 2 shy

A Comments on Phase 1 - Site Screening Approach

As discussed in the Work Plan Phase 1 of the HHRA involves a screening of properties and other

exposure points based on available PCB data to determine whether they should be retained for

a more detailed risk assessment To accomplish this screening EPA has developed screening risk-

based concentrations (SRBCs) for PCBs in floodplain soil riverbank soil and sediments for

various exposure scenarios - ie residential recreational agricultural utility worker and

commercialindustrial groundskeeper scenarios Separate lowcontact and highcontact SRBCs

have been calculated for the residential and recreational scenarios The initial screening step

involves comparison of the maximum PCB concentrations for each medium and exposure point

(classified in the appropriate scenario) to the most conservative SRBC applicable to that medium

and exposure point If the maximum concentration is below that SRBC the property or other

exposure point will be dropped from further evaluation If the maximum concentration exceeds

the applicable SRBC EPA will if necessary conduct additional sampling and will determine the

95 upper confidence limit (UCL) on the mean of the PCB data for such property or other

exposure point If the 95 UCL does not exceed the applicable SRBC the property or exposure

point will be dropped from further consideration If the 95 UCL exceeds the SRBC the

property or exposure point will be carried through to Phase 2 of the HHRA

GE supports the concept of a phased screening approach However it believes that several of the

assumptions and other input parameters used by EPA to develop the SRBCs for PCBs are overly

conservative and that as a result the SRBCs are likewise excessively stringent even for a

screening-level analysis In this connection GE preserves all prior positions expressed to EPA

regarding the toxicity of PCBs and appropriate exposure parameters However these comments

are focused on a limited number of the parameters used by EPA in deriving the SRBCs Although

EPA is currently in the process of the Phase 1 screening GE believes that EPA should take

account of these comments and revise its SRBCs accordingly for purposes of conducting additional

or more detailed screening

- 3 shy

1 Inconsistent target risk levels

According to the information provided in Tables 62-1 62-5 and 62-6 of the Work Plan EPA

has not used a consistent target risk level for calculating its SRBCs While a cancer risk of 5 x

106 is identified as the target risk level for the residential and recreational SRBCs target risks of

16 x 10~6 and 11 x 106 are identified for the utility worker and commercial groundskeeper

scenarios respectively The Work Plan presents no rationale for these differences in target risk

levels

To provide a more meaningful screening step EPA should pick a reasonable and consistent target

risk level and use it as the basis for calculating all scenario-specific SRBCs This will ensure that

all SRBCs are equally protective of public health The purpose of conducting the Phase 1 screen

is to identify those properties that do not need to be evaluated further in the HHRA and to drop

them from future consideration so that the HHRA can focus on key areas rather than all areas

If the screening SRBCs are overly stringent numerous properties will be retained for the risk

assessment despite the fact that they will likely be shown to have no unacceptable risks once the

risk assessment is completed

To eliminate this problem and increase the efficiency of the screening step GE believes that the

target cancer risk level should be established as 1 x 10~5 for all scenarios This risk level is the

midpoint of the EPAs acceptable risk range (1 x 10^ to 1 x 106) (EPA 1990) When coupled

with conservative exposure parameters and upper-bound media concentrations as are outlined in

the Work Plan this approach will successfully screen out those properties that do not pose a risk

while retaining those that warrant further evaluation

2 Overly conservative exposure assumptions

Several of the exposure parameter values that are used to derive the SRBCs are extremely

conservative and cannot be considered representative of likely activities even for a screening-level

assessment These parameters include some of the exposure frequencies and exposed skin surface

areas used for the residential and recreational scenarios and the soil ingestion rates used for the

recreational and utility worker scenarios

- 4 shy

a Exposure frequencies for residential and recreational scenarios

Some of the exposure frequencies used by EPA to derive the SRBCs for the residential and

recreational scenarios are unrealistic and implausible These include the assumed frequency of

two days per week for seven months per year for low-contact residential exposure to riverbank

soils and sediments and low-contact recreational exposure to floodplain and riverbank soils and

sediments as well as the assumed frequency of three days per week for seven monthsyear for

high-contact residential and recreational exposure to sediments

The Work Plan indicates that the determination of high-contact versus low-contact areas depends

on the accessibility of the area -- eg judgments about the likelihood that the area has physical

features consistent with a child (or adult) contacting riverbank soil on a regular basis (p 6-26)

Physical features that would lead to a low-contact determination would presumably include at a

minimum steep slopes undercut river banks rocky areas with limited or no access to soil or

sediment and marshy areas Such conditions are not consistent with contact with the contaminated

media on a regular basis In these circumstances it is unreasonable to assume an exposure

frequency of two days per week for each week in the seven-month exposure period Such a

frequency clearly represents regular contact which is inconsistent with the limited accessibility

of these areas A more appropriate assumption for low-contact areas would be to assume

occasional contact (eg one day per week or two days per month) during the five warmer months

of the year

For high-contact areas EPA has assumed an exposure frequency of three days per week for seven

months While this is a highly conservative estimate even for contact with floodplain and

riverbank soils it is totally unreasonable for contact with sediments particularly given the

assumed exposed skin surface areas Even for high-contact sediment areas it is extremely

unlikely that children and adults would venture into the river and be exposed to sediments as often

as three days per week during every week from April through October In fact children would

probably not play in the river at all in the cooler portions of this period Thus it would be more

reasonable to assume an average exposure frequency of around one day per week during this

seven-month period

- 5 shy

b Exposed skin surface areas for residential and recreational scenarios

The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface

areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For

these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will

be in contact with soil or sediment every time that the individuals are present in the affected areas

(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin

surface areas during the spring and fall months are likely to be less because individuals will be

wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level

of exposure assumed by EPA may occur during the summer months it would not occur every time

the individual visits the affected area Hence EPA should reduce the exposed skin surface areas

to take account of the lesser areas exposed during the spring and fall and thus develop time-

weighted averages for the exposed skin surface areas

c Soil ingestion rates for recreational scenario

For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion

rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not

agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)

However if these daily soil ingestion rates are to be used EPA should at least apply some factor

to account for the fraction of the daily soil ingestion that is derived from the affected recreational

area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under

real-life conditions total daily soil ingestion will occur over the entire day and will include soil

andor sediments from a variety of different locations some of which will not be contaminated

However EPA has assumed that total daily soil ingestion will come from a single contaminated

area and has not included a factor for the fraction of soil ingested from the site While it is

presumed that this omission is due to the fact that EPA is concerned that some individuals may

occasionally obtain all of their ingested soil from a single contaminated location such an

assumption is not reasonable when coupled with the exposure frequencies that are assumed In

addition some highly accessible areas may be very limited in size as is the case on some

floodplain properties Because of this even if all of the soil is obtained from an exposure area

only a portion of that soil will be contaminated

- 6 shy

Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for

soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil

or sediment ingested daily comes from the site In reality individuals who are engaged in river-

based activities will ingest a combination of soil and sediment during the course of their activity

Given these considerations GE believes that a factor of 50 percent should be used to represent the

fraction of ingested soil or sediment that is derived from contaminated recreational areas during

recreational activities This will correct for the fact that daily ingestion exposure occurs with a

combination of soil and sediment It will also correct for the fact that only limited portions of the

day will be spent in recreational areas In addition such a factor will more accurately reflect

exposures at those areas where the area of contamination is very limited This approach is

consistent with EPAs approach in its development of the cleanup goals under the October 7 1999

Consent Decree for recreational areas outside the River (EPA 1999a)

d Soil ingestion rate for utility worker scenario

EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker

scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion

rate may be appropriate for utility workers who will have direct contact with soil the ingestion

rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of

137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy

contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA

itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for

Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based

on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal

risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility

workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of

08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced

soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion

rate should be used in developing a revised SRBC for the utility worker scenario

- 7 shy

B Comments on Phase 2 - Baseline HHRA Approach

For those properties or other exposure points that remain after the Phase 1 screening the Work

Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a

number of comments on that proposed approach

1 Selection of non-PCB contaminants of potential concern

The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential

concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for

each area that has PCB concentrations above the applicable SRBC the available sampling data on

other constituents will be compared with several criteria to determine the need for additional

sampling in that area These criteria include (a) background concentrations established for the

area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas

which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department

of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts

Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are

exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area

If they are not exceeded the Work Plan states that the baseline HHRA will be performed using

the available analytical data (p 6-31) The latter statement appears to mean that even if the

concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are

below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those

constituents will nevertheless be carried through to the second step in the selection of COPCs

In the second step all constituents detected will be screened for the selection of COPCs by

comparing the maximum detected concentrations with contaminant-specific and medium-specific

risk-based concentrations The Work Plan states that the principal criterion for selection of a

COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the

constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used

it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01

will be used Any constituent for which the maximum concentration exceeds the PRG will be

carried forward for the risk assessment In addition the Work Plan states that constituents which

- 8 shy

are known to bioaccumulate in biological tissues but do not exceed the PROs will be further

evaluated for possible inclusion in the risk assessment

GE has several concerns about this proposed two-step approach

First GE believes that if the concentration of non-PCB constituents in a given area does not

exceed the criteria for further sampling (as listed above) those constituents should be dropped

from further consideration and not carried through to the second step in selecting COPCs As

EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)

which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are

consistent with local background levels or are less than the 1 ppb TEQ concentration for

dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no

significant risk to human health Hence such constituents need not and should not be considered

further in the selection of COPCs

Second for those constituents that are carried forward to the second step in the COPC selection

process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum

concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that

it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy

cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ

of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or

most of which in fact make no significant contribution to actual risks This would undercut the

purpose of the screening approach and will greatly and needlessly complicate the risk assessment

without any corresponding benefit in terms of achieving additional health protection Instead GE

recommends that in selecting the COPCs for the Rest of the River EPA should use an approach

similar to the approach approved by EPA and specified in the SOW for Areas Outside the River

including the non-bank portions of certain floodplain properties This approach utilizes not only

1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria

- 9 shy

the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed

below

For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in

recent guidance from EPA headquarters for use in addressing these constituents in soil at

CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in

residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in

commercialindustrial areas The SOW provides specifically for use of these PRGs in screening

out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these

PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that

these PRGs should be used not only as screening criteria but also as remedial goals which is

consistent with the approach embodied in the SOW for Areas Outside the River See Section B2

below)

For other constituents EPA should follow a step-wise approach that begins with comparisons to

the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for

use in screening such constituents without further reduction they should not be reduced by an

additional 10 times For constituents for which such PRGs do not exist a similar screening step

should be employed using PRGs for surrogate compounds2

EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison

to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB

constituents to site-specific background levels (eg concentrations upstream of the GE facility)

is appropriate to ensure that only those constituents attributable to releases from the GE facility

are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for

Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 3: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 3 shy

1 Inconsistent target risk levels

According to the information provided in Tables 62-1 62-5 and 62-6 of the Work Plan EPA

has not used a consistent target risk level for calculating its SRBCs While a cancer risk of 5 x

106 is identified as the target risk level for the residential and recreational SRBCs target risks of

16 x 10~6 and 11 x 106 are identified for the utility worker and commercial groundskeeper

scenarios respectively The Work Plan presents no rationale for these differences in target risk

levels

To provide a more meaningful screening step EPA should pick a reasonable and consistent target

risk level and use it as the basis for calculating all scenario-specific SRBCs This will ensure that

all SRBCs are equally protective of public health The purpose of conducting the Phase 1 screen

is to identify those properties that do not need to be evaluated further in the HHRA and to drop

them from future consideration so that the HHRA can focus on key areas rather than all areas

If the screening SRBCs are overly stringent numerous properties will be retained for the risk

assessment despite the fact that they will likely be shown to have no unacceptable risks once the

risk assessment is completed

To eliminate this problem and increase the efficiency of the screening step GE believes that the

target cancer risk level should be established as 1 x 10~5 for all scenarios This risk level is the

midpoint of the EPAs acceptable risk range (1 x 10^ to 1 x 106) (EPA 1990) When coupled

with conservative exposure parameters and upper-bound media concentrations as are outlined in

the Work Plan this approach will successfully screen out those properties that do not pose a risk

while retaining those that warrant further evaluation

2 Overly conservative exposure assumptions

Several of the exposure parameter values that are used to derive the SRBCs are extremely

conservative and cannot be considered representative of likely activities even for a screening-level

assessment These parameters include some of the exposure frequencies and exposed skin surface

areas used for the residential and recreational scenarios and the soil ingestion rates used for the

recreational and utility worker scenarios

- 4 shy

a Exposure frequencies for residential and recreational scenarios

Some of the exposure frequencies used by EPA to derive the SRBCs for the residential and

recreational scenarios are unrealistic and implausible These include the assumed frequency of

two days per week for seven months per year for low-contact residential exposure to riverbank

soils and sediments and low-contact recreational exposure to floodplain and riverbank soils and

sediments as well as the assumed frequency of three days per week for seven monthsyear for

high-contact residential and recreational exposure to sediments

The Work Plan indicates that the determination of high-contact versus low-contact areas depends

on the accessibility of the area -- eg judgments about the likelihood that the area has physical

features consistent with a child (or adult) contacting riverbank soil on a regular basis (p 6-26)

Physical features that would lead to a low-contact determination would presumably include at a

minimum steep slopes undercut river banks rocky areas with limited or no access to soil or

sediment and marshy areas Such conditions are not consistent with contact with the contaminated

media on a regular basis In these circumstances it is unreasonable to assume an exposure

frequency of two days per week for each week in the seven-month exposure period Such a

frequency clearly represents regular contact which is inconsistent with the limited accessibility

of these areas A more appropriate assumption for low-contact areas would be to assume

occasional contact (eg one day per week or two days per month) during the five warmer months

of the year

For high-contact areas EPA has assumed an exposure frequency of three days per week for seven

months While this is a highly conservative estimate even for contact with floodplain and

riverbank soils it is totally unreasonable for contact with sediments particularly given the

assumed exposed skin surface areas Even for high-contact sediment areas it is extremely

unlikely that children and adults would venture into the river and be exposed to sediments as often

as three days per week during every week from April through October In fact children would

probably not play in the river at all in the cooler portions of this period Thus it would be more

reasonable to assume an average exposure frequency of around one day per week during this

seven-month period

- 5 shy

b Exposed skin surface areas for residential and recreational scenarios

The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface

areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For

these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will

be in contact with soil or sediment every time that the individuals are present in the affected areas

(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin

surface areas during the spring and fall months are likely to be less because individuals will be

wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level

of exposure assumed by EPA may occur during the summer months it would not occur every time

the individual visits the affected area Hence EPA should reduce the exposed skin surface areas

to take account of the lesser areas exposed during the spring and fall and thus develop time-

weighted averages for the exposed skin surface areas

c Soil ingestion rates for recreational scenario

For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion

rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not

agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)

However if these daily soil ingestion rates are to be used EPA should at least apply some factor

to account for the fraction of the daily soil ingestion that is derived from the affected recreational

area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under

real-life conditions total daily soil ingestion will occur over the entire day and will include soil

andor sediments from a variety of different locations some of which will not be contaminated

However EPA has assumed that total daily soil ingestion will come from a single contaminated

area and has not included a factor for the fraction of soil ingested from the site While it is

presumed that this omission is due to the fact that EPA is concerned that some individuals may

occasionally obtain all of their ingested soil from a single contaminated location such an

assumption is not reasonable when coupled with the exposure frequencies that are assumed In

addition some highly accessible areas may be very limited in size as is the case on some

floodplain properties Because of this even if all of the soil is obtained from an exposure area

only a portion of that soil will be contaminated

- 6 shy

Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for

soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil

or sediment ingested daily comes from the site In reality individuals who are engaged in river-

based activities will ingest a combination of soil and sediment during the course of their activity

Given these considerations GE believes that a factor of 50 percent should be used to represent the

fraction of ingested soil or sediment that is derived from contaminated recreational areas during

recreational activities This will correct for the fact that daily ingestion exposure occurs with a

combination of soil and sediment It will also correct for the fact that only limited portions of the

day will be spent in recreational areas In addition such a factor will more accurately reflect

exposures at those areas where the area of contamination is very limited This approach is

consistent with EPAs approach in its development of the cleanup goals under the October 7 1999

Consent Decree for recreational areas outside the River (EPA 1999a)

d Soil ingestion rate for utility worker scenario

EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker

scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion

rate may be appropriate for utility workers who will have direct contact with soil the ingestion

rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of

137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy

contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA

itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for

Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based

on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal

risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility

workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of

08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced

soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion

rate should be used in developing a revised SRBC for the utility worker scenario

- 7 shy

B Comments on Phase 2 - Baseline HHRA Approach

For those properties or other exposure points that remain after the Phase 1 screening the Work

Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a

number of comments on that proposed approach

1 Selection of non-PCB contaminants of potential concern

The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential

concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for

each area that has PCB concentrations above the applicable SRBC the available sampling data on

other constituents will be compared with several criteria to determine the need for additional

sampling in that area These criteria include (a) background concentrations established for the

area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas

which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department

of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts

Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are

exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area

If they are not exceeded the Work Plan states that the baseline HHRA will be performed using

the available analytical data (p 6-31) The latter statement appears to mean that even if the

concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are

below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those

constituents will nevertheless be carried through to the second step in the selection of COPCs

In the second step all constituents detected will be screened for the selection of COPCs by

comparing the maximum detected concentrations with contaminant-specific and medium-specific

risk-based concentrations The Work Plan states that the principal criterion for selection of a

COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the

constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used

it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01

will be used Any constituent for which the maximum concentration exceeds the PRG will be

carried forward for the risk assessment In addition the Work Plan states that constituents which

- 8 shy

are known to bioaccumulate in biological tissues but do not exceed the PROs will be further

evaluated for possible inclusion in the risk assessment

GE has several concerns about this proposed two-step approach

First GE believes that if the concentration of non-PCB constituents in a given area does not

exceed the criteria for further sampling (as listed above) those constituents should be dropped

from further consideration and not carried through to the second step in selecting COPCs As

EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)

which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are

consistent with local background levels or are less than the 1 ppb TEQ concentration for

dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no

significant risk to human health Hence such constituents need not and should not be considered

further in the selection of COPCs

Second for those constituents that are carried forward to the second step in the COPC selection

process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum

concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that

it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy

cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ

of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or

most of which in fact make no significant contribution to actual risks This would undercut the

purpose of the screening approach and will greatly and needlessly complicate the risk assessment

without any corresponding benefit in terms of achieving additional health protection Instead GE

recommends that in selecting the COPCs for the Rest of the River EPA should use an approach

similar to the approach approved by EPA and specified in the SOW for Areas Outside the River

including the non-bank portions of certain floodplain properties This approach utilizes not only

1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria

- 9 shy

the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed

below

For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in

recent guidance from EPA headquarters for use in addressing these constituents in soil at

CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in

residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in

commercialindustrial areas The SOW provides specifically for use of these PRGs in screening

out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these

PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that

these PRGs should be used not only as screening criteria but also as remedial goals which is

consistent with the approach embodied in the SOW for Areas Outside the River See Section B2

below)

For other constituents EPA should follow a step-wise approach that begins with comparisons to

the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for

use in screening such constituents without further reduction they should not be reduced by an

additional 10 times For constituents for which such PRGs do not exist a similar screening step

should be employed using PRGs for surrogate compounds2

EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison

to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB

constituents to site-specific background levels (eg concentrations upstream of the GE facility)

is appropriate to ensure that only those constituents attributable to releases from the GE facility

are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for

Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 4: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 4 shy

a Exposure frequencies for residential and recreational scenarios

Some of the exposure frequencies used by EPA to derive the SRBCs for the residential and

recreational scenarios are unrealistic and implausible These include the assumed frequency of

two days per week for seven months per year for low-contact residential exposure to riverbank

soils and sediments and low-contact recreational exposure to floodplain and riverbank soils and

sediments as well as the assumed frequency of three days per week for seven monthsyear for

high-contact residential and recreational exposure to sediments

The Work Plan indicates that the determination of high-contact versus low-contact areas depends

on the accessibility of the area -- eg judgments about the likelihood that the area has physical

features consistent with a child (or adult) contacting riverbank soil on a regular basis (p 6-26)

Physical features that would lead to a low-contact determination would presumably include at a

minimum steep slopes undercut river banks rocky areas with limited or no access to soil or

sediment and marshy areas Such conditions are not consistent with contact with the contaminated

media on a regular basis In these circumstances it is unreasonable to assume an exposure

frequency of two days per week for each week in the seven-month exposure period Such a

frequency clearly represents regular contact which is inconsistent with the limited accessibility

of these areas A more appropriate assumption for low-contact areas would be to assume

occasional contact (eg one day per week or two days per month) during the five warmer months

of the year

For high-contact areas EPA has assumed an exposure frequency of three days per week for seven

months While this is a highly conservative estimate even for contact with floodplain and

riverbank soils it is totally unreasonable for contact with sediments particularly given the

assumed exposed skin surface areas Even for high-contact sediment areas it is extremely

unlikely that children and adults would venture into the river and be exposed to sediments as often

as three days per week during every week from April through October In fact children would

probably not play in the river at all in the cooler portions of this period Thus it would be more

reasonable to assume an average exposure frequency of around one day per week during this

seven-month period

- 5 shy

b Exposed skin surface areas for residential and recreational scenarios

The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface

areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For

these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will

be in contact with soil or sediment every time that the individuals are present in the affected areas

(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin

surface areas during the spring and fall months are likely to be less because individuals will be

wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level

of exposure assumed by EPA may occur during the summer months it would not occur every time

the individual visits the affected area Hence EPA should reduce the exposed skin surface areas

to take account of the lesser areas exposed during the spring and fall and thus develop time-

weighted averages for the exposed skin surface areas

c Soil ingestion rates for recreational scenario

For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion

rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not

agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)

However if these daily soil ingestion rates are to be used EPA should at least apply some factor

to account for the fraction of the daily soil ingestion that is derived from the affected recreational

area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under

real-life conditions total daily soil ingestion will occur over the entire day and will include soil

andor sediments from a variety of different locations some of which will not be contaminated

However EPA has assumed that total daily soil ingestion will come from a single contaminated

area and has not included a factor for the fraction of soil ingested from the site While it is

presumed that this omission is due to the fact that EPA is concerned that some individuals may

occasionally obtain all of their ingested soil from a single contaminated location such an

assumption is not reasonable when coupled with the exposure frequencies that are assumed In

addition some highly accessible areas may be very limited in size as is the case on some

floodplain properties Because of this even if all of the soil is obtained from an exposure area

only a portion of that soil will be contaminated

- 6 shy

Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for

soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil

or sediment ingested daily comes from the site In reality individuals who are engaged in river-

based activities will ingest a combination of soil and sediment during the course of their activity

Given these considerations GE believes that a factor of 50 percent should be used to represent the

fraction of ingested soil or sediment that is derived from contaminated recreational areas during

recreational activities This will correct for the fact that daily ingestion exposure occurs with a

combination of soil and sediment It will also correct for the fact that only limited portions of the

day will be spent in recreational areas In addition such a factor will more accurately reflect

exposures at those areas where the area of contamination is very limited This approach is

consistent with EPAs approach in its development of the cleanup goals under the October 7 1999

Consent Decree for recreational areas outside the River (EPA 1999a)

d Soil ingestion rate for utility worker scenario

EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker

scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion

rate may be appropriate for utility workers who will have direct contact with soil the ingestion

rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of

137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy

contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA

itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for

Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based

on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal

risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility

workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of

08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced

soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion

rate should be used in developing a revised SRBC for the utility worker scenario

- 7 shy

B Comments on Phase 2 - Baseline HHRA Approach

For those properties or other exposure points that remain after the Phase 1 screening the Work

Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a

number of comments on that proposed approach

1 Selection of non-PCB contaminants of potential concern

The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential

concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for

each area that has PCB concentrations above the applicable SRBC the available sampling data on

other constituents will be compared with several criteria to determine the need for additional

sampling in that area These criteria include (a) background concentrations established for the

area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas

which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department

of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts

Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are

exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area

If they are not exceeded the Work Plan states that the baseline HHRA will be performed using

the available analytical data (p 6-31) The latter statement appears to mean that even if the

concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are

below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those

constituents will nevertheless be carried through to the second step in the selection of COPCs

In the second step all constituents detected will be screened for the selection of COPCs by

comparing the maximum detected concentrations with contaminant-specific and medium-specific

risk-based concentrations The Work Plan states that the principal criterion for selection of a

COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the

constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used

it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01

will be used Any constituent for which the maximum concentration exceeds the PRG will be

carried forward for the risk assessment In addition the Work Plan states that constituents which

- 8 shy

are known to bioaccumulate in biological tissues but do not exceed the PROs will be further

evaluated for possible inclusion in the risk assessment

GE has several concerns about this proposed two-step approach

First GE believes that if the concentration of non-PCB constituents in a given area does not

exceed the criteria for further sampling (as listed above) those constituents should be dropped

from further consideration and not carried through to the second step in selecting COPCs As

EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)

which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are

consistent with local background levels or are less than the 1 ppb TEQ concentration for

dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no

significant risk to human health Hence such constituents need not and should not be considered

further in the selection of COPCs

Second for those constituents that are carried forward to the second step in the COPC selection

process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum

concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that

it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy

cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ

of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or

most of which in fact make no significant contribution to actual risks This would undercut the

purpose of the screening approach and will greatly and needlessly complicate the risk assessment

without any corresponding benefit in terms of achieving additional health protection Instead GE

recommends that in selecting the COPCs for the Rest of the River EPA should use an approach

similar to the approach approved by EPA and specified in the SOW for Areas Outside the River

including the non-bank portions of certain floodplain properties This approach utilizes not only

1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria

- 9 shy

the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed

below

For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in

recent guidance from EPA headquarters for use in addressing these constituents in soil at

CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in

residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in

commercialindustrial areas The SOW provides specifically for use of these PRGs in screening

out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these

PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that

these PRGs should be used not only as screening criteria but also as remedial goals which is

consistent with the approach embodied in the SOW for Areas Outside the River See Section B2

below)

For other constituents EPA should follow a step-wise approach that begins with comparisons to

the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for

use in screening such constituents without further reduction they should not be reduced by an

additional 10 times For constituents for which such PRGs do not exist a similar screening step

should be employed using PRGs for surrogate compounds2

EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison

to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB

constituents to site-specific background levels (eg concentrations upstream of the GE facility)

is appropriate to ensure that only those constituents attributable to releases from the GE facility

are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for

Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 5: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 5 shy

b Exposed skin surface areas for residential and recreational scenarios

The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface

areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For

these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will

be in contact with soil or sediment every time that the individuals are present in the affected areas

(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin

surface areas during the spring and fall months are likely to be less because individuals will be

wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level

of exposure assumed by EPA may occur during the summer months it would not occur every time

the individual visits the affected area Hence EPA should reduce the exposed skin surface areas

to take account of the lesser areas exposed during the spring and fall and thus develop time-

weighted averages for the exposed skin surface areas

c Soil ingestion rates for recreational scenario

For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion

rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not

agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)

However if these daily soil ingestion rates are to be used EPA should at least apply some factor

to account for the fraction of the daily soil ingestion that is derived from the affected recreational

area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under

real-life conditions total daily soil ingestion will occur over the entire day and will include soil

andor sediments from a variety of different locations some of which will not be contaminated

However EPA has assumed that total daily soil ingestion will come from a single contaminated

area and has not included a factor for the fraction of soil ingested from the site While it is

presumed that this omission is due to the fact that EPA is concerned that some individuals may

occasionally obtain all of their ingested soil from a single contaminated location such an

assumption is not reasonable when coupled with the exposure frequencies that are assumed In

addition some highly accessible areas may be very limited in size as is the case on some

floodplain properties Because of this even if all of the soil is obtained from an exposure area

only a portion of that soil will be contaminated

- 6 shy

Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for

soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil

or sediment ingested daily comes from the site In reality individuals who are engaged in river-

based activities will ingest a combination of soil and sediment during the course of their activity

Given these considerations GE believes that a factor of 50 percent should be used to represent the

fraction of ingested soil or sediment that is derived from contaminated recreational areas during

recreational activities This will correct for the fact that daily ingestion exposure occurs with a

combination of soil and sediment It will also correct for the fact that only limited portions of the

day will be spent in recreational areas In addition such a factor will more accurately reflect

exposures at those areas where the area of contamination is very limited This approach is

consistent with EPAs approach in its development of the cleanup goals under the October 7 1999

Consent Decree for recreational areas outside the River (EPA 1999a)

d Soil ingestion rate for utility worker scenario

EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker

scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion

rate may be appropriate for utility workers who will have direct contact with soil the ingestion

rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of

137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy

contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA

itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for

Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based

on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal

risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility

workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of

08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced

soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion

rate should be used in developing a revised SRBC for the utility worker scenario

- 7 shy

B Comments on Phase 2 - Baseline HHRA Approach

For those properties or other exposure points that remain after the Phase 1 screening the Work

Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a

number of comments on that proposed approach

1 Selection of non-PCB contaminants of potential concern

The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential

concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for

each area that has PCB concentrations above the applicable SRBC the available sampling data on

other constituents will be compared with several criteria to determine the need for additional

sampling in that area These criteria include (a) background concentrations established for the

area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas

which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department

of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts

Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are

exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area

If they are not exceeded the Work Plan states that the baseline HHRA will be performed using

the available analytical data (p 6-31) The latter statement appears to mean that even if the

concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are

below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those

constituents will nevertheless be carried through to the second step in the selection of COPCs

In the second step all constituents detected will be screened for the selection of COPCs by

comparing the maximum detected concentrations with contaminant-specific and medium-specific

risk-based concentrations The Work Plan states that the principal criterion for selection of a

COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the

constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used

it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01

will be used Any constituent for which the maximum concentration exceeds the PRG will be

carried forward for the risk assessment In addition the Work Plan states that constituents which

- 8 shy

are known to bioaccumulate in biological tissues but do not exceed the PROs will be further

evaluated for possible inclusion in the risk assessment

GE has several concerns about this proposed two-step approach

First GE believes that if the concentration of non-PCB constituents in a given area does not

exceed the criteria for further sampling (as listed above) those constituents should be dropped

from further consideration and not carried through to the second step in selecting COPCs As

EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)

which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are

consistent with local background levels or are less than the 1 ppb TEQ concentration for

dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no

significant risk to human health Hence such constituents need not and should not be considered

further in the selection of COPCs

Second for those constituents that are carried forward to the second step in the COPC selection

process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum

concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that

it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy

cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ

of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or

most of which in fact make no significant contribution to actual risks This would undercut the

purpose of the screening approach and will greatly and needlessly complicate the risk assessment

without any corresponding benefit in terms of achieving additional health protection Instead GE

recommends that in selecting the COPCs for the Rest of the River EPA should use an approach

similar to the approach approved by EPA and specified in the SOW for Areas Outside the River

including the non-bank portions of certain floodplain properties This approach utilizes not only

1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria

- 9 shy

the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed

below

For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in

recent guidance from EPA headquarters for use in addressing these constituents in soil at

CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in

residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in

commercialindustrial areas The SOW provides specifically for use of these PRGs in screening

out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these

PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that

these PRGs should be used not only as screening criteria but also as remedial goals which is

consistent with the approach embodied in the SOW for Areas Outside the River See Section B2

below)

For other constituents EPA should follow a step-wise approach that begins with comparisons to

the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for

use in screening such constituents without further reduction they should not be reduced by an

additional 10 times For constituents for which such PRGs do not exist a similar screening step

should be employed using PRGs for surrogate compounds2

EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison

to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB

constituents to site-specific background levels (eg concentrations upstream of the GE facility)

is appropriate to ensure that only those constituents attributable to releases from the GE facility

are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for

Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 6: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 6 shy

Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for

soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil

or sediment ingested daily comes from the site In reality individuals who are engaged in river-

based activities will ingest a combination of soil and sediment during the course of their activity

Given these considerations GE believes that a factor of 50 percent should be used to represent the

fraction of ingested soil or sediment that is derived from contaminated recreational areas during

recreational activities This will correct for the fact that daily ingestion exposure occurs with a

combination of soil and sediment It will also correct for the fact that only limited portions of the

day will be spent in recreational areas In addition such a factor will more accurately reflect

exposures at those areas where the area of contamination is very limited This approach is

consistent with EPAs approach in its development of the cleanup goals under the October 7 1999

Consent Decree for recreational areas outside the River (EPA 1999a)

d Soil ingestion rate for utility worker scenario

EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker

scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion

rate may be appropriate for utility workers who will have direct contact with soil the ingestion

rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of

137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy

contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA

itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for

Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based

on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal

risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility

workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of

08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced

soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion

rate should be used in developing a revised SRBC for the utility worker scenario

- 7 shy

B Comments on Phase 2 - Baseline HHRA Approach

For those properties or other exposure points that remain after the Phase 1 screening the Work

Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a

number of comments on that proposed approach

1 Selection of non-PCB contaminants of potential concern

The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential

concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for

each area that has PCB concentrations above the applicable SRBC the available sampling data on

other constituents will be compared with several criteria to determine the need for additional

sampling in that area These criteria include (a) background concentrations established for the

area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas

which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department

of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts

Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are

exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area

If they are not exceeded the Work Plan states that the baseline HHRA will be performed using

the available analytical data (p 6-31) The latter statement appears to mean that even if the

concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are

below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those

constituents will nevertheless be carried through to the second step in the selection of COPCs

In the second step all constituents detected will be screened for the selection of COPCs by

comparing the maximum detected concentrations with contaminant-specific and medium-specific

risk-based concentrations The Work Plan states that the principal criterion for selection of a

COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the

constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used

it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01

will be used Any constituent for which the maximum concentration exceeds the PRG will be

carried forward for the risk assessment In addition the Work Plan states that constituents which

- 8 shy

are known to bioaccumulate in biological tissues but do not exceed the PROs will be further

evaluated for possible inclusion in the risk assessment

GE has several concerns about this proposed two-step approach

First GE believes that if the concentration of non-PCB constituents in a given area does not

exceed the criteria for further sampling (as listed above) those constituents should be dropped

from further consideration and not carried through to the second step in selecting COPCs As

EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)

which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are

consistent with local background levels or are less than the 1 ppb TEQ concentration for

dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no

significant risk to human health Hence such constituents need not and should not be considered

further in the selection of COPCs

Second for those constituents that are carried forward to the second step in the COPC selection

process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum

concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that

it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy

cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ

of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or

most of which in fact make no significant contribution to actual risks This would undercut the

purpose of the screening approach and will greatly and needlessly complicate the risk assessment

without any corresponding benefit in terms of achieving additional health protection Instead GE

recommends that in selecting the COPCs for the Rest of the River EPA should use an approach

similar to the approach approved by EPA and specified in the SOW for Areas Outside the River

including the non-bank portions of certain floodplain properties This approach utilizes not only

1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria

- 9 shy

the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed

below

For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in

recent guidance from EPA headquarters for use in addressing these constituents in soil at

CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in

residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in

commercialindustrial areas The SOW provides specifically for use of these PRGs in screening

out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these

PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that

these PRGs should be used not only as screening criteria but also as remedial goals which is

consistent with the approach embodied in the SOW for Areas Outside the River See Section B2

below)

For other constituents EPA should follow a step-wise approach that begins with comparisons to

the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for

use in screening such constituents without further reduction they should not be reduced by an

additional 10 times For constituents for which such PRGs do not exist a similar screening step

should be employed using PRGs for surrogate compounds2

EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison

to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB

constituents to site-specific background levels (eg concentrations upstream of the GE facility)

is appropriate to ensure that only those constituents attributable to releases from the GE facility

are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for

Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 7: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 7 shy

B Comments on Phase 2 - Baseline HHRA Approach

For those properties or other exposure points that remain after the Phase 1 screening the Work

Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a

number of comments on that proposed approach

1 Selection of non-PCB contaminants of potential concern

The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential

concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for

each area that has PCB concentrations above the applicable SRBC the available sampling data on

other constituents will be compared with several criteria to determine the need for additional

sampling in that area These criteria include (a) background concentrations established for the

area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas

which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department

of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts

Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are

exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area

If they are not exceeded the Work Plan states that the baseline HHRA will be performed using

the available analytical data (p 6-31) The latter statement appears to mean that even if the

concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are

below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those

constituents will nevertheless be carried through to the second step in the selection of COPCs

In the second step all constituents detected will be screened for the selection of COPCs by

comparing the maximum detected concentrations with contaminant-specific and medium-specific

risk-based concentrations The Work Plan states that the principal criterion for selection of a

COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the

constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used

it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01

will be used Any constituent for which the maximum concentration exceeds the PRG will be

carried forward for the risk assessment In addition the Work Plan states that constituents which

- 8 shy

are known to bioaccumulate in biological tissues but do not exceed the PROs will be further

evaluated for possible inclusion in the risk assessment

GE has several concerns about this proposed two-step approach

First GE believes that if the concentration of non-PCB constituents in a given area does not

exceed the criteria for further sampling (as listed above) those constituents should be dropped

from further consideration and not carried through to the second step in selecting COPCs As

EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)

which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are

consistent with local background levels or are less than the 1 ppb TEQ concentration for

dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no

significant risk to human health Hence such constituents need not and should not be considered

further in the selection of COPCs

Second for those constituents that are carried forward to the second step in the COPC selection

process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum

concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that

it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy

cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ

of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or

most of which in fact make no significant contribution to actual risks This would undercut the

purpose of the screening approach and will greatly and needlessly complicate the risk assessment

without any corresponding benefit in terms of achieving additional health protection Instead GE

recommends that in selecting the COPCs for the Rest of the River EPA should use an approach

similar to the approach approved by EPA and specified in the SOW for Areas Outside the River

including the non-bank portions of certain floodplain properties This approach utilizes not only

1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria

- 9 shy

the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed

below

For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in

recent guidance from EPA headquarters for use in addressing these constituents in soil at

CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in

residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in

commercialindustrial areas The SOW provides specifically for use of these PRGs in screening

out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these

PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that

these PRGs should be used not only as screening criteria but also as remedial goals which is

consistent with the approach embodied in the SOW for Areas Outside the River See Section B2

below)

For other constituents EPA should follow a step-wise approach that begins with comparisons to

the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for

use in screening such constituents without further reduction they should not be reduced by an

additional 10 times For constituents for which such PRGs do not exist a similar screening step

should be employed using PRGs for surrogate compounds2

EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison

to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB

constituents to site-specific background levels (eg concentrations upstream of the GE facility)

is appropriate to ensure that only those constituents attributable to releases from the GE facility

are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for

Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 8: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 8 shy

are known to bioaccumulate in biological tissues but do not exceed the PROs will be further

evaluated for possible inclusion in the risk assessment

GE has several concerns about this proposed two-step approach

First GE believes that if the concentration of non-PCB constituents in a given area does not

exceed the criteria for further sampling (as listed above) those constituents should be dropped

from further consideration and not carried through to the second step in selecting COPCs As

EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)

which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are

consistent with local background levels or are less than the 1 ppb TEQ concentration for

dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no

significant risk to human health Hence such constituents need not and should not be considered

further in the selection of COPCs

Second for those constituents that are carried forward to the second step in the COPC selection

process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum

concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that

it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy

cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ

of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or

most of which in fact make no significant contribution to actual risks This would undercut the

purpose of the screening approach and will greatly and needlessly complicate the risk assessment

without any corresponding benefit in terms of achieving additional health protection Instead GE

recommends that in selecting the COPCs for the Rest of the River EPA should use an approach

similar to the approach approved by EPA and specified in the SOW for Areas Outside the River

including the non-bank portions of certain floodplain properties This approach utilizes not only

1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria

- 9 shy

the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed

below

For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in

recent guidance from EPA headquarters for use in addressing these constituents in soil at

CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in

residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in

commercialindustrial areas The SOW provides specifically for use of these PRGs in screening

out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these

PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that

these PRGs should be used not only as screening criteria but also as remedial goals which is

consistent with the approach embodied in the SOW for Areas Outside the River See Section B2

below)

For other constituents EPA should follow a step-wise approach that begins with comparisons to

the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for

use in screening such constituents without further reduction they should not be reduced by an

additional 10 times For constituents for which such PRGs do not exist a similar screening step

should be employed using PRGs for surrogate compounds2

EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison

to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB

constituents to site-specific background levels (eg concentrations upstream of the GE facility)

is appropriate to ensure that only those constituents attributable to releases from the GE facility

are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for

Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 9: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 9 shy

the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed

below

For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in

recent guidance from EPA headquarters for use in addressing these constituents in soil at

CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in

residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in

commercialindustrial areas The SOW provides specifically for use of these PRGs in screening

out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these

PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that

these PRGs should be used not only as screening criteria but also as remedial goals which is

consistent with the approach embodied in the SOW for Areas Outside the River See Section B2

below)

For other constituents EPA should follow a step-wise approach that begins with comparisons to

the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for

use in screening such constituents without further reduction they should not be reduced by an

additional 10 times For constituents for which such PRGs do not exist a similar screening step

should be employed using PRGs for surrogate compounds2

EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison

to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB

constituents to site-specific background levels (eg concentrations upstream of the GE facility)

is appropriate to ensure that only those constituents attributable to releases from the GE facility

are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for

Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 10: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 10 -

Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs

for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the

Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It

is also consistent with the MCP which provides that if the concentration of a constituent is at or

below background levels then that constituent will be considered to pose No Significant Risk (310

CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate

because those standards have been developed as conservative risk-based standards (using a target

cancer risk of 1 x 106) that will ensure protection of human health and the environment Again

EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs

in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its

Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself

has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating

non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background

levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by

considerable precedent and will ensure that a consistent approach is employed to address such

constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound

basis for departing from that approach for the Rest of the River portion of the Site

Moreover use solely of the maximum detected concentrations in selecting COPCs from these

constituents is overly conservative and inappropriate For example comparisons to site-specific

background levels should be made using an appropriate statistical technique or summary statistics

as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are

generally to be made based on average exposure point concentrations (310 CMR 400926(3))

Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the

River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate

constituents that are not likely to contribute to Site-related risks

Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p

6-31) will be considered or what criteria will be used to examine this Unless the approach used

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 11: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 11 shy

to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion

of certain constituents EPA should provide supplemental information that outlines the specific

approach and criteria that will be used to identify and select these constituents for inclusion

2 Risk assessment of dioxinsfurans

The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and

furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity

Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of

150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in

detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations

and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range

of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under

the linear nonthreshold cancer model with differences resulting from the selection of tumor

classification scheme the interspecies sealing factor used and the model used for low-dose

extrapolation Thus there is enormous uncertainty associated with any selected CSF

In these circumstances GE believes that to the extent possible in this risk assessment EPA

should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth

in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as

screening levels rather than conducting forward risk calculations for these compounds This

approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and

is consistent with the approach adopted by EPA for Areas Outside the River as described in the

SOW

However to the extent that EPA includes dioxins and furans in forward risk calculations GE

believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999

comments based on current information that CSF is more supportable than the CSF of 150000

(mgkg-day)1 specified in the Work Plan

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 12: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 12 shy

3 Use of TEFTEQ approach for dioxin-like PCBs

In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-

like PCB congeners (Section 63325) Under this approach the concentrations of these PCB

congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the

provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks

In its June 3 1999 comments GE presented several reasons why this approach is not appropriate

or scientifically defensible In discussions with GE in November 1999 EPA stated that since it

will make risk management decisions on the basis of total PCB concentrations rather than

congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good

correlation between the concentrations of the dioxin-like congeners and concentrations of total

PCBs There is no indication in the Work Plan however that EPA has established a good

correlation between the congener-specific data and the total PCB data Indeed in more recent

discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes

such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The

absence of such a correlation would itself make the TEQ data irrelevant to risk management

decisions

In any event GE continues to believe for the same reasons given in its June 3 1999 comments

(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable

These reasons include (a) the lack of sufficient congener-specific data to provide reliable

estimates of true congener concentrations (b) the double counting of carcinogenic risks that would

result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the

uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy

TCDD) to be used to evaluate the resulting TEQs

4 Failure to consider alternative CSFs and RfDs for PCBs

The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)

and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its

Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999

comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 13: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 13 shy

overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it

recommended the development of alternative toxicity values for PCBs EPA has made no

reference to this presentation in its Work Plan

This failure to consider the alternative toxicological information presented by GE is unlawful

Under a settlement agreement with GE (Settlement Agreement in General Electric Company v

Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance

memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course

of an EPA risk assessment and presents alternative toxicological information that may be used in

place of the IRIS values EPA has an obligation to consider all credible and relevant evidence

before it There is no indication that EPA has considered the alternative toxicological

information presented by GE Rather it has ignored GEs showing and chosen simply to defer

to the current IRIS values (which is precisely the course of conduct which the settlement and

guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the

alternative toxicological information presented by GE

5 Use of subchronic RfD for PCBs

The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it

determines that exposure is intermittent enough to result in an exposure period that is substantially

reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total

exposure period is substantially below the 7-year period that is considered by EPA to represent

chronic exposure There is no further discussion of this issue so it is not clear what the Agency

will use as a benchmark to identify intermittent exposure

Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For

example in developing the SRBC for utility workers EPA assumes that exposure occurs 5

daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even

when considered over the entire 25-year exposure period the total exposure consists of 125 days

which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This

exposure would certainly be considered intermittent despite the fact that the total exposure period

lasts for 25 years

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 14: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 14 shy

As a second example in developing its SRBCs for recreational exposures EPA has assumed that

children aged 1-6 are exposed for six years during recreational activities Even using the highly

conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months

per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days

exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year

chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the

subchronic RfD to evaluate noncancer hazards to young children

In short the Work Plan states that EPA will use the subchronic RfD when exposures during a

given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy

year chronic exposure period identified by EPA It is clear however that even for scenarios

where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD

EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those

scenarios where it is relevant

6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities

The Work Plan provides that while consumption of fish and game will be evaluated separately

all direct-contact recreational activities will be evaluated using a single exposure scenario and a

single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking

canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not

consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained

that use of a single exposure scenario and set of assumptions for such diverse activities is

inappropriate because it will force EPA either to use the most conservative assumptions for any

of these activities (which would clearly result in grossly overestimated exposure estimates) or to

pick compromise parameters that will not be truly representative of any of these activities (which

would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999

GE had understood that the Agency recognized this problem and intended to consider further

whether and how to divide this single direct-contact recreational scenario into a number of more

specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the

Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 15: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 15 shy

account of the significant differences in exposure parameters among these types of activities In

doing so actual recreational exposures will be more closely approximated and more realistic and

appropriate cleanup goals can be developed

7 Lack of information about specific exposure assumptions to be used

As in the prior draft the Work Plan does not present the specific exposure parameter values and

equations to be used in EPAs risk assessment but simply states that they will be developed based

on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such

information in its June 3 1999 comments and in prior meetings with EPA In response EPA has

advised GE that the Agencys risk assessment staff will hold a series of technical meetings with

GE representatives and consultants and at such meetings will be prepared to discuss the most

appropriate exposure assumptions and parameter values to be used GE reiterates its request for

such meetings and looks forward to discussing these issues with EPA

8 Use of probabilistic risk analysis

In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that

probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and

exposure pathways that result in significant risk estimates However it does not mention

probabilistic risk assessment techniques in connection with any of the main risk assessments In

prior discussions with GE EPA indicated that one of the principal reasons why it collected a large

amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption

As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic

exposure model in the main risk analysis of the fish consumption pathway However the Work

Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not

even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk

assessment guidance documents and information sources that EPA intends to use in conducting the

HHRA

During the last few years EPAs policies and guidelines have focused on improving risk

management by presenting decision-makers with the entire range of possible risks rather than a

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 16: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 16 shy

single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that

numerical risk assessments should be accompanied by a full characterization of the uncertainties

limitations and assumptions in the risk assessment The use of two-dimensional (2-D)

probabilistic analysis such as Microexposure analysis can more effectively characterize the

impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed

population by considering time-dependent changes

Microexposure analysis has been used by EPA and by independent researchers to simulate duration

of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been

used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants

in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater

fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in

the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon

Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in

EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a

viable alternative for modeling time-dependent variability in concentrations daily activity patterns

and other behavioral exposure factors (EPA 1999c)

Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of

probabilistic risk analysis through application to the fish ingestion pathway at the

SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the

Microexposure event model gave risk estimates that were approximately an order of magnitude

lower than those derived using deterministic and other probabilistic approaches He attributed this

decrease to the power of the method to incorporate changes over time

GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model

to estimate exposures via the fish consumption pathway Such an approach will allow EPA to

evaluate the variations in exposures over time and among anglers taking into consideration site-

specific factors as well as the uncertainties associated with the input parameters As outlined in

Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some

types of exposures including fish consumption can be expected to vary over time Anglers may

change their activity levels and their fish consumption behaviors as they age In addition different

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 17: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 17 shy

species and sizes of fish may be consumed from a variety of locations on the river over time and

may be prepared using different methods Thus fish tissue concentrations will not be constant

among anglers or throughout the exposure period for a single angler A 2-D Microexposure

analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can

be developed using point estimates and will provide information that will allow better informed

risk management decisions to be made (EPA 1999c)

9 Use of 95 UCLs or maximum values as exposure point concentrations

The Work Plan states that the exposure point concentration (EPC) for each property or other

exposure area will be based on the 95 UCL of the site data or the maximum detected

concentration whichever is lower (Section 6344) The Work Plan also describes the statistical

techniques to be used to calculate the 95 UCL including use of Lands H-statistic for

lognormally distributed data (p 6-32)

GEs comments on this approach are set forth in detail in Attachment A which was recently

provided to EPA As discussed in that attachment and demonstrated by EPA contractors and

consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL

can and frequently does produce large overestimates of the true mean both when there are slight

deviations from the assumed lognormal distribution of the data and even when the data are

obtained from a lognormal population Hence that approach should not be used

Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that

EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12

Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will

(a) allow calculation of EPCs that are much more truly representative of actual exposures than use

of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner

that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and

(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent

of unnecessary removal or other remediation that will be required for the Rest of the River GE

urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be

available to calculate reliable spatial averages for each exposure area

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 18: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 18 shy

However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it

should still not use the H-statistic approach proposed in the Work Plan As discussed in

Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there

are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that

are more reliable and representative To the extent that EPA is unwilling to adopt the spatial

averaging approach for the Rest of the River risk assessment GE recommends that EPA employ

a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to

estimate the EPC for each exposure area This approach which is discussed in detail in

Attachment A is consistent with EPA guidance provides more reliable estimates of the 95

UCLs considers the location and density of each sample in each exposure area mitigates the

statistical impact of a small database and can be used regardless of the distribution of the sampling

data

References

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Pan A) -Interim Final US Environmental Protection Agency Office of

Emergency and Remedial Response Washington DC EPA5401-89-002 July

EPA 1990 National Contingency Plan US Environmental Protection Agency

Washington DC

EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection

Agency Washington DC 57 Federal Register 14 22888-22938 May 29

EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors

Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators

and Regional Administrators Office of the Administrator Washington DC

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 19: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 19 shy

EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its

Application to the General US Population US Environmental Protection Agency Office of

Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011

August

EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of

IRIS values in Superfund risk assessments US Environmental Protection Agency

Washington DC OSWER Directive 92857-16 December 21

EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency

US Environmental Protection Agency Office of the Administrator Washington DC

March

EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency

Science Policy Council Washington DC February

EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US

Environmental Protection Agency US Environmental Protection Agency Office of Research

and Development Washington DC May 15

EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection

Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p

EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA

by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re

Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental

Protection Agency Washington DC OSWER Directive 92004-26 April 13

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 20: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 20 shy

EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts

DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric

Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk

Assessment January 29

EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE

Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River

- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in

United States et al v General Electric Company (No 99-30225-MAP US Dist Court for

Dist of Mass) August 4

EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation

Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US

Environmental Protection Agency Office of Emergency and Remedial Response Washington

DC March 2 Draft

EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for

Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental

Protection Agency Solid Waste and Emergency Response December

EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile

Removal Action and Consolidation Prepared by EPA Region I October

GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment

Memorandum from General Electric to the Massachusetts Department of Environmental

Protection and the US Environmental Protection Agency September 11

Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of

microexposure Monte Carlo modeling to human health risk assessments A case study of

modeling childhood lead exposure Society for Risk Analysis Annual conference and

Exposition December 4-7 Baltimore MD

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 21: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 21 shy

Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling

Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the

88th Annual Meeting of the Air and Waste Management Association San Antonio Texas

USA June

Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential

Occupancy Periods and Its Application to the General US Population US Environmental

Protection Agency Office of Air Quality Planning and Standards Research Triangle Park

NC EPA-4503-92-011 August

Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price

1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to

evaluate the need for fish advisories Dioxin 93 13th International Symposium on

Chlorinated Dioxins and Related Compounds Vienna Austria

Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and

WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a

microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993

Environmental Conference Boston MA

Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a

microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion

of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of

SRA Annual Meeting and Exposition Waikiki HI (Abstract)

Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure

event analysis to model potential exposures to PCBs through ingestion of fish from the Upper

Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International

Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands

Organohalogen 3061-65

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 22: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 22 shy

Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a

microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII

Conference Proceedings Washington DC October 15-17

Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish

advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to

evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings

Abstract 1

Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure

Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden

dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition

Abstract

MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the

Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of

Environmental Protection Bureau of Waste Site Cleanup and Office of Research and

Standards Boston MA WSCORS-95-141 July

Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow

CERCLA Site in Riverside County California January

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when data are not normal or lognormal Risk Analysis Vol 19 November

Sielken RL 1994 More realistic exposure durations for more realistic people Society for

Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7

Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of

several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk

Assess 5 (4) 823-843

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 23: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

- 23 shy

Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River

Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22

Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic

River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 24: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

ATTACHMENT A

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 25: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

A COMPARISON OF ALTERNATIVE METHODS

FOR CALCULATING EXPOSURE POINT CONCENTRATIONS

Prepared on Behalf of the General Electric Company

by Ogden Environmental and Energy Services Co Inc

Introduction

US Environmental Protection Agencys final Human Health Risk Assessment Work Plan

(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the

Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for

each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site

data or the maximum detected concentration whichever is lower (Section 6344) The Work

Plan states that the 95 UCL will be calculated using the Student- statistic for normally

distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No

plan is presented for estimating the mean from datasets where no parametric distribution can be

defined The way in which the EPCs are calculated can have a substantial impact on the values

derived and their representation of the actual levels of exposures that are likely to occur under

various scenarios

There are serious limitations associated with using EPAs proposed approach for calculating

EPCs While the Student- statistic may be appropriate when the sampling data are normally

distributed environmental samples seldom resemble a strict normal distribution Consequently

if EPA were to assume instead that the data for each exposure area are lognormally distributed

then use of the H-statistic to calculate the EPCs for these areas may substantially impact the

results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results

in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs

policy stating that when the 95 UCL is greater than the maximum observed concentration the

maximum value should instead by used as the EPC However both the inflated 95 UCL and

c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 26: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

the maximum value are poor estimates of actual levels of exposure that will occur within a given

exposure area

GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk

assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods

approved by EPA scientists and other experts for calculating EPCs that more accurately

represent the actual exposure levels that may be experienced by individuals who come into

contact with contaminated soil or sediment This paper provides an analysis of the limitations

associated with using the H-statistic to calculate EPCs for each exposure area In addition it

presents an evaluation of alternative approaches for calculating the EPCs that will result in more

representative results In GEs view the most appropriate approach is the use of the area-

weighting or spatial averaging technique that has been approved and specified by EPA for the

banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River

under the October 1999 Consent Decree However if and where that approach is not used this

paper describes other alternative techniques for deriving a 95 UCL that is much more

representative of actual exposure point concentrations than use of the H-statisticmdashnamely the

bootstrapping technique and a modification of that approach known as spatial bootstrapping

Of these the latter is preferable because it takes into account the spatial distribution of the

sampling data This paper illustrates the EPCs that would be calculated for an example property

using the various methods discussed herein

Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL

In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate

concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic

mean of the observed concentrations within an exposure area This position is based on the

concept that chronic direct-contact exposure to soils most likely occurs as a result of short

repeated random contact events throughout the identified exposure area Thus the mean

concentration is reflective of the long-term exposure concentration Because it is this nature of

the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of

the EPC regardless of the statistical distribution of observed sample concentrations

c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 27: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

The guidance document goes on to note that because there is uncertainty in the overall soil

concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the

arithmetic mean should be used to account for this uncertainty This approach was adopted by

EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is

appropriate for both the central tendency and reasonable maximum exposure computations

For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the

EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)

( - t -05s 2 +tlaquoTraquo^) UCL=e

Where

UCL = a specified limit (ie 95) on the estimate of the arithmetic mean

x (bar) = the mean of log-transformed data

s = the variance of the log-transformed sample distribution

s = the standard deviation of the log-transformed sample distribution

H a statistic accounting for interaction of the distribution developed by Land (1975)

n = number of analytical samples

Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this

inflated estimate of the mean reflects the uncertainty associated with having a limited number of

samples from a skewed data population This uncertainty is compounded when as is the case

with much of the Housatonic River investigation sampling efforts are concentrated in areas that

are suspected of having elevated levels of PCBs Because the sampling of areas that are not

likely to be contaminated is much more limited despite the fact that those areas are equally

important components of the total exposure area the datasets for most parcels are likely to be

highly biased

c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 28: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data

are not clearly lognormally distributed Because the distributions of contaminant concentrations

at sites are often positively skewed they are frequently assumed to be lognormally distributed

for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by

suggesting that it is reasonable to assume that data are lognormally distributed even if the data

do not fit a normal distribution after log-transformation However this assumption is often

incorrect As noted in a recent EPA technical support document prepared by EPA contractors

Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple

populations or outliers and not necessarily due to lognormally distributed data Small

deviations between the assumed lognormal distribution of the data and the actual population

distribution can greatly influence the statistical results and yield a gross misrepresentation of the

true mean and associated confidence bound Thus when sampling data that do not fit a

lognormal distribution are inappropriately assumed to be lognormally distributed and are

evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not

at all representative of site conditions

Even when the data are lognormally distributed the H-statistic can produce concentrations that

are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and

compared UCLs calculated through different statistical estimation methods using a variety of

sampling data sets and reported that the UCLs obtained using the H-statistic were consistently

larger and less accurate than the UCLs calculated using other statistical approaches They noted

that it is observed that the H-UCL becomes order of magnitudes higher even when the data

were obtained from a lognormal population and can lead to incorrect conclusions This is

especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical

merit of the H-UCL in environmental applications is questionable as it becomes orders of

magnitude higher than the largest concentration observed when the [standard deviation] of the

log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL

approach can yield unusually high false positives which would result in an unnecessary

cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in

an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may

c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 29: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous

waste sites

Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic

mean is not correctly estimated within the calculation This difficulty occurs when an

exponential estimator of the mean is not equal to the typical calculation of the sum of all samples

divided by the number of samples

Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To

control this EPA recommended that when the statistic is greater than the maximum observed

concentration at the site the maximum value should instead be used as the EPC This approach

leads to the unsupportable assumption that the average concentration within an exposure area is

best represented by the highest concentration encountered there and does not account in any way

for the spatial distribution of contamination As EPA itself has recognized in responding to

comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological

are exposed to average contaminant concentrations not maximum concentrations where

exposure is equally likely throughout a given area (EPA 1999)

In view of these considerations EPA contractors and consultants have recommended that the H-

statistic should not be used particularly in cases where a lognormal distribution of the data is not

certain For example based on comparison of the UCLs calculated using the H-statistic versus

those calculated using other approaches the EPA technical support document by Singh et al

(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain

an estimate of the upper confidence limit of the mean should be avoided These authors

concluded that the use of the jackknife method the bootstrap method or the Chebychev

inequality method provides better input to the risk assessors and may result in a significant

reduction in remediation costs This is especially true when the number of samples is thirty or

less[U]pper confidence limits based on any of the other estimation procedures appear to be

more stable and reliable than those based on the H-statistic Similar findings were reported by

Schulz and Griffin (1999)

c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 30: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

Area-Weighted Averaging (Spatial Averaging)

When samples are collected randomly within an exposure area and exposures are likely to occur

randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean

is most representative of the long-term EPC However if sampling is intentionally biased

towards more contaminated areas the sampling database no longer accurately represents the

areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for

the Lower Housatonic River Contaminant delineation is an important goal of the sampling

effort on the lower Housatonic River Consequently sampling is intentionally focused on areas

(eg floodplain soils) that are known or suspected of being contaminated Because many of the

samples in these datasets will be taken from high concentration areas with limited samples taken

from uncontaminated areas calculating a mean using typical statistical procedures gives undue

weight to the locations where high concentrations exist A dataset obtained in this way cannot be

used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be

representative of the overall exposure area

This problem can be substantially mitigated by area-weighting the sample results so that the

importance of each sample in the calculation of the EPC is directly proportional to the fraction of

the exposure area that it represents Area-weighted averaging may be conducted in a number of

ways with differing levels of complexity In some cases polygonal areas are constructed by

drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen

polygons Clifford et al 1995) in more complex approaches unbiased estimates of

concentration and variance change with distance and the results are used to construct a spatial

grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)

Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in

the Rest-of-the River investigation area-weighting using Thiessen polygons is a more

appropriate approach The process is to draw perpendicular lines that are exactly equidistant

between contiguous samples until each sample is surrounded by a polygon Polygons at the

periphery of an identified exposure area are truncated at its boundary (eg a property boundary)

One reason that this technique is so facile is that the construction of Thiessen polygons is a

c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 31: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

hardwired operation in most commercially-available Geographic Information System (GIS)

applications

The area inside each polygon is then used to weight the samples in statistical calculations which

is in essence saying that the probability of encountering the concentration represented by a

certain polygon is equal to the proportion of the total exposure area represented by the area in

that polygon The area-weighted concentration is calculated using the following formula (Isaaks

and Srivastava 1989)

Where

X(bar)sc = area weighted mean concentration (mgkg)

C| = the concentration representing the condition within polygon

where there are = 1 through n polygons and

pi = the proportion of the total area that is incorporated in polygon (unitless)

This approach includes sampling data from all portions of the exposure area and corrects for the

biases in the sampling data by taking into consideration the spacing of the samples within the

exposure area This is the approach that was used by GE and approved by EPA for calculating

EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach

specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which

include a number of floodplain properties along the River) Hence use of this approach for the

Lower Housatonic River would lead to a consistent approach for the different portions of the

overall Site subject to the Consent Decree Moreover spatial averaging has also been used at

other Superfund sites For example such an approach was used at the Koppers Company

Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To

offset that bias a nearest neighbor approach was used This involved placing the sampling

data in clusters by gridding the site and calculating an average concentration for each grid using

the sampling data available within it Then the average concentrations for each grid were

c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 32: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

averaged to yield an average concentration for the entire exposure area In addition at the

Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the

average goal across the exposure unit (EPA 1997b)

Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board

(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human

Health Evaluation Manual In that review the SAB stated that the

[ejstimate of the RME cannot ignore the distribution of contamination at the site and the

distribution of individual behaviors which lead to exposure Rather the spatial distribution

of the concentration over the site must be considered along with a distribution reflecting the

relative frequency with which people are likely to visit different parts of the site For this

reason any summary measure of concentration (such as the average proposed in the RAGS

document) that does not take into account the spatial distribution of the underlying samples is

likely to be inadequate Therefore the Committee believes that the Agency should give

strong consideration to incorporating methods such as kriging or triangulation that take

into account the spatial distribution of contamination to characterize exposure (p 15)

In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is

fully protective of human health and the environment Indeed in responding to comments on the

Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach

so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach

requires the availability of adequate data to support the calculation of representative spatial

averages over the exposure area However the need for adequate data exists regardless of the

averaging method used to derive the EPCs and is particularly marked when one attempts to use

the H-statistic approach Under the phased approach that EPA has outlined in the Human

Health Risk Assessment Work Plan such sampling would only be necessary in areas where the

95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the

conduct of additional sampling is warranted in order to avoid the use of the extreme and

unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate

additional samples as necessary for individual exposure areas to ensure that reasonable and

cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 33: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

representative estimates of the average concentration can be calculated for the risk assessment of

each exposure area

Bootstrapping Approach to Calculating the Upper Bound EPC

As stated previously GE supports the use of area-weighted arithmetic averages as the most

representative means of calculating EPCs However if EPA rejects this approach or for areas

where the sampling data are not considered sufficient to support use of such spatial averaging

there are alternative methods of calculating the 95 UCL which would result in more reliable

and representative EPCs than use of the H-statistic As discussed previously and demonstrated

by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to

substantially overestimate the 95 UCL particularly where datasets are small These EPA

contractors and consultants have evaluated and suggested alternative and superior approaches for

calculating the 95 UCL Of these the most straightforward is bootstrapping which allows

calculation of a probability distribution of EPCs and is able to offset some of the statistical

limitations associated with small datasets It provides a method for calculating a more reliable

estimator of the EPC particularly when datasets are small and data do not exactly fit either a

normal or lognormal distribution This technique is described below and is recommended as an

alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for

exposure areas in the Lower Housatonic River risk assessment

The Bootstrap Method

Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall

dataset of analytical values is selected at random and a mean is calculated This mean is

archived and the statistical sampling procedure is repeated with replacement of the data The

bootstrapped means are collected until their distribution stabilizes (this often requires many

iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95

UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of

statistics which states that if a population is repeatedly sampled for purposes of calculating an

arithmetic mean the distribution of the means will ultimately take the form of a normal

c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 34: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

distribution This Theorem applies to any population regardless of its statistical distribution

Bootstrapping relies on the same theory except that instead of repeatedly taking new samples

bootstrapping models select repeatedly from a single dataset

This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and

is more technically justifiable The reduction obtained from this procedure varies according to

the amount of data available for sampling and its variance but is consistently lower than the 95

H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is

more dramatic for exposure areas with fewer samples Thus the bootstrapping method is

particularly well-suited to exposure areas like many along the Lower Housatonic River where

sampling is limited

When sample sizes are small the distribution for the bootstrap run may not appear normal This

is because the number of samples required to be included in each subsample in order to approach

normality is larger than is available from the site data This can be mitigated by further

sampling It should be noted however that as the sample size increases the variance becomes

tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95

UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95

UCL that might be obtained from a larger dataset

Spatial Bootstrapping

While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-

statistic method it still does not take into consideration the spatial distribution of the sampling

locations across an exposure area and the biases associated with it To reflect this spatial

distribution a modified bootstrapping approach can be used to reflect the locations and densities

of samples collected on each property This modified bootstrapping approach (spatial

bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to

calculate more reliable 95 UCL estimates of the EPC for each exposure area

c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 35: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with

the bootstrap method discussed above It is conducted in a series of three steps

First each sample is area-weighted using the Thiessen polygon method The weight given to

each sample is determined by calculating the proportion of the total exposure area that is

represented by the samples polygon

Second the smallest polygon area is identified and all other polygon areas are normalized with

respect to its size and are weighted accordingly in the sampling database In other words if the

area of one polygon is five times larger than the smallest polygon it is given a weight that is five

times greater in the sampling database from which values are selected for the bootstrap The

normalized areas are rounded off to the closest integer An expanded dataset is then created

from the original dataset and reflects the ratios of each polygon area to the smallest polygon area

This expansion is accomplished by multiplying the rounded normalized areas by a selected

value The resulting values represent an area-weighted number of replicates of the sample

results in the expanded dataset For example if the ratio of the sample X area to the smallest area

is 7 and the dataset is expanded using a factor of five then sample X is represented in the

expanded dataset as 35 (7 times 5) entries

Finally mean concentrations are calculated (with replacement) from the expanded dataset using

between 250 and 500 of the results (depending upon the size of the expanded dataset) for each

iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a

cumulative distribution of estimated means After sufficient iterations this cumulative

distribution will attain the symmetry of a normal distribution with the 50th percentile result equal

to the area-weighted mean calculated for the original dataset The value at the 95th percentile of

this distribution of means is the 95 UCL

c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 36: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

This approach is both consistent with EPA guidance and preferable to the approach currently

proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used

as the EPC in order to correct for the uncertainties and limitations of the sampling database the

guidance does not dictate which method must be used to calculate the 95 UCL While EPA

has in the past often relied on the H-statistic that method need not be used and recent reports

by EPA contractors and consultants have recommended against use of the H-statistic to avoid

systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)

The spatial bootstrapping approach is better than the H-statistic approach for several reasons

First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)

bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial

bootstrapping allows for consideration of the location and density of the samples for each

exposure area and corrects for the relative contribution of each in the database Third this

approach mitigates the statistical impact of a small database Finally it can be used regardless of

the distribution of the sampling data Thus for all sampling datasets including those that do not

fit either a normal or lognormal distribution this approach can be used successfully

Sample Calculation

To demonstrate the effect that the various statistical methods discussed above have on the

calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted

sample calculations for a hypothetical property For this purpose Ogden developed a

hypothetical set of sampling data that are similar to the types of data that might be collected from

a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples

have been collected from a property that is roughly 1000 by 1000 feet in size The

concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic

mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219

square feet

c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 37: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

These data have been used to calculate mean and 95 UCL EPCs using four different

approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)

a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2

As can be seen in Table 2 the estimated mean values are similar for all four approaches The

arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This

is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is

identical to the mean calculated in spatial bootstrapping but is somewhat higher than the

arithmetic and bootstrapped means This is because some of the sampling locations with high

concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are

weighted more heavily in approaches that include area-weighting

Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping

the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL

calculated using the H-statistic (106 ppm) is considerably higher exceeding both the

bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL

exceeds the maximum value (426 ppm) in the sampling database which means that the

maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that

would need to be used for this exposure area would be the maximum detected concentration of

426 ppm This concentration is clearly not representative of the sampling dataset for this

hypothetical property

Discussion

The example presented above clearly demonstrates the limitations of the H-statistic approach and

its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin

(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical

property is reasonably large containing 30 samples and the range of concentrations is fairly

typical of the types of concentrations likely to be found in floodplain properties along the Lower

Housatonic There are no outliers to skew the results and the concentrations fall along a fairly

typical concentration gradient with a few samples showing very low concentrations a few

c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 38: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

showing fairly high concentrations and the remainder fairly evenly distributed in between In

addition some of the largest polygon areas are associated with low concentration areas while

others are associated with the highest concentrations There is nothing about this dataset to

indicate that the samples reported are not representative of the range of concentrations that is

likely to be present on the property

Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured

concentration by a factor of more than six with the result being that the 95 UCL cannot be

used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result

is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly

even distribution of sampling results These problems will be even more pronounced for certain

real properties along the Lower Housatonic River where the sampling results may be highly

skewed dense sampling occurs in high concentrated areas andor sample numbers are more

limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that

maximum concentration values will need to be used for many of those properties Such an end

result is unsupportable and will result in the overestimation of site risks It could also result in

the unnecessary remediation of a number of exposure areas

Conclusions

As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz

and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental

applications can and frequently does produce substantial overestimates of the true mean that are

in no way representative of the concentrations actually contacted by people in an exposure area

and hence that approach should be avoided Moreover this problem cannot be solved by simply

using the maximum detected concentration in the area if the calculated 95 UCL exceeds that

maximum These problems can be avoided through the use of area-weighted averaging which

takes into consideration the fraction of the exposure area that is represented by each sample

taken This approach allows for a more realistic and representative estimate of the level of

exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the

area-weighted averaging approach for the Lower Housatonic River and to collect the data

c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 39: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

necessary to support that approach Use of spatial average concentrations as EPCs is consistent

with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with

the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the

Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain

floodplain properties along the River

However if EPA rejects this approach (or for areas where EPA determines that the data are

insufficient to support that approach and decides not to collect additional data) and insists on

estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial

bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more

representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets

many of the limitations associated with the H-statistic and is consistent with EPA guidance

References

Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and

MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of

exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy

906

EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation

Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency

and Remedial Response Washington DC EPA5401-89-002 July

EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER

Bulletin Volume 1 Number 1 Publication 92857-08 May

EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the

Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund

Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy

93-007 February

c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 40: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

EPA 1994 A change in the approach for estimating the average and reasonable maximum

exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2

August

EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)

Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA

Region IV Contract 68-W9-0055

EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by

A Singh A Singh and M Engelhard US Environmental Protection Agency Office of

Research and Development EPA600R-97006 December

EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection

Agency Region V EPA54lR-97070 August 15

EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal

Action and Consolidation US Environmental Protection Agency Region I October

Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New

York

Land CE 1975 Tables of confidence limits for linear functions of the normal mean and

variance Selected Tables in Mathematical Statistics 3365-410

Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations

when the data are not normal or lognormal Risk Analysis 19577-584

Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final

Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000

c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 41: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

5

10

15

20

25

30

Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property

Sample Number

1

2 j

4

6

7

8

9

1 1

12

13

14

16

17

18

19

21

22

23

24

26

27

28

29

Polygon Area (ft2)

138219

105217

110214

31021

11399

7004

14268

12121

3697

13297

33087

65541

57494

9014

9994

2123

2341

7446

2613

44231

2362

8745

5008

18762

24296

51356

12996

18019

119814

132012

Total PCB (ppm)

0022

0028

107

163

167

201

2 19

325

377

758

761

762

771

799

872

969

972

127

143

143

154

163

187

191

202

308

329

389

406

426

Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q

Page 42: Comments of the Genera Electril Companc yComments of the Genera Electril Companc y on EPA's Fina Humal Healtn h Ris Assessmenk Wort k Plan ... Consent Decree). Phas 1 involvee s the

R U

O C ^gt X R

M _C EL o 2 VI

o O 3)

is R c

J U 1

_

II ^

c R 4)

00 ri ^O

mdash

bdquo ^O IT)

c o o ra

R

B

C s s pound -5

W)e EL c RU en o O a

U U 3 V

bullI

C Ril

O t

p^

Hi

T3

X03 agt E pound u n pound1

pound0

o r-T

II u

O M R

4raquo gt

bullsi

0 0

IT ON

nshy

^

bull3

o ^ S

amp CJ) c

c z o laquo T3

UM laquo

O k

I I C CJC R C

is ltshy R o u

8 lt R 3

B SE c o x

R

3

^ R

2

bull| R

i X

C R

i

U

1

~o R

^laquo

^O

12

(N

O

c-i

H

T3 0gt

ra

Crt

S I 2

III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^

pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra

bullpound U X)

9 2 -5 = sect I J3 if shy

Q Q ra

V) W

ilaquo S ca ^ laquo u c mdash

a s e ta Q