Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Comments of the General Electric Company on EPAs Final Human Health Risk Assessment Work Plan
for Lower Housatonic River
Prepared by Ogden Environmental and Energy Services June 12 2000
This document presents the comments of the General Electric Company (GE) prepared on GEs
behalf by Ogden Environmental and Energy Services on the US Environmental Protection
Agencys (EPAs) final Human Health Risk Assessment Work Plan (Work Plan) which is Chapter
6 of EPAs final Supplemental Investigation Work Plan for the Lower Housatonic River (Weston
2000a) That Work Plan presents an iterative two-phase approach to evaluating potential human
health risks for the Lower River (designated as the Rest of the River in the October 7 1999
Consent Decree) Phase 1 involves the screening of properties based on their current uses and
the available analytical data for PCBs to determine whether there is a need for additional sampling
and whether properties should be retained for more detailed risk assessment Phase 2 involves a
complex forward risk assessment approach for each property that is retained after completion of
the Phase 1 screening
Based on recent discussions with EPA GE understands that EPA is currently in the process of
conducting the Phase 1 screening portion of its human health risk assessment (HHRA) for the
Lower Housatonic River As a follow-up to those discussions these comments provide GEs 11
position on a number of assumptions underlying EPAs Phase 1 screening levels and also address 1
EPAs proposed methodologies for Phase 2 as outlined in the Work Plan GE previously o
provided comments on a draft of the Work Plan on June 3 1999 While some changes have been
made to the methodologies between that earlier draft and the final Work Plan a number of the 8 NJ
approaches outlined in the final Work Plan are the same as in the prior version GE preserves its
position on all points set forth in its June 3 1999 comments However to avoid needless
repetition the present comments focus only on a few key issues in the final Work Plan including
certain new issues that are raised by the final Work Plan
01=
- 2 shy
A Comments on Phase 1 - Site Screening Approach
As discussed in the Work Plan Phase 1 of the HHRA involves a screening of properties and other
exposure points based on available PCB data to determine whether they should be retained for
a more detailed risk assessment To accomplish this screening EPA has developed screening risk-
based concentrations (SRBCs) for PCBs in floodplain soil riverbank soil and sediments for
various exposure scenarios - ie residential recreational agricultural utility worker and
commercialindustrial groundskeeper scenarios Separate lowcontact and highcontact SRBCs
have been calculated for the residential and recreational scenarios The initial screening step
involves comparison of the maximum PCB concentrations for each medium and exposure point
(classified in the appropriate scenario) to the most conservative SRBC applicable to that medium
and exposure point If the maximum concentration is below that SRBC the property or other
exposure point will be dropped from further evaluation If the maximum concentration exceeds
the applicable SRBC EPA will if necessary conduct additional sampling and will determine the
95 upper confidence limit (UCL) on the mean of the PCB data for such property or other
exposure point If the 95 UCL does not exceed the applicable SRBC the property or exposure
point will be dropped from further consideration If the 95 UCL exceeds the SRBC the
property or exposure point will be carried through to Phase 2 of the HHRA
GE supports the concept of a phased screening approach However it believes that several of the
assumptions and other input parameters used by EPA to develop the SRBCs for PCBs are overly
conservative and that as a result the SRBCs are likewise excessively stringent even for a
screening-level analysis In this connection GE preserves all prior positions expressed to EPA
regarding the toxicity of PCBs and appropriate exposure parameters However these comments
are focused on a limited number of the parameters used by EPA in deriving the SRBCs Although
EPA is currently in the process of the Phase 1 screening GE believes that EPA should take
account of these comments and revise its SRBCs accordingly for purposes of conducting additional
or more detailed screening
- 3 shy
1 Inconsistent target risk levels
According to the information provided in Tables 62-1 62-5 and 62-6 of the Work Plan EPA
has not used a consistent target risk level for calculating its SRBCs While a cancer risk of 5 x
106 is identified as the target risk level for the residential and recreational SRBCs target risks of
16 x 10~6 and 11 x 106 are identified for the utility worker and commercial groundskeeper
scenarios respectively The Work Plan presents no rationale for these differences in target risk
levels
To provide a more meaningful screening step EPA should pick a reasonable and consistent target
risk level and use it as the basis for calculating all scenario-specific SRBCs This will ensure that
all SRBCs are equally protective of public health The purpose of conducting the Phase 1 screen
is to identify those properties that do not need to be evaluated further in the HHRA and to drop
them from future consideration so that the HHRA can focus on key areas rather than all areas
If the screening SRBCs are overly stringent numerous properties will be retained for the risk
assessment despite the fact that they will likely be shown to have no unacceptable risks once the
risk assessment is completed
To eliminate this problem and increase the efficiency of the screening step GE believes that the
target cancer risk level should be established as 1 x 10~5 for all scenarios This risk level is the
midpoint of the EPAs acceptable risk range (1 x 10^ to 1 x 106) (EPA 1990) When coupled
with conservative exposure parameters and upper-bound media concentrations as are outlined in
the Work Plan this approach will successfully screen out those properties that do not pose a risk
while retaining those that warrant further evaluation
2 Overly conservative exposure assumptions
Several of the exposure parameter values that are used to derive the SRBCs are extremely
conservative and cannot be considered representative of likely activities even for a screening-level
assessment These parameters include some of the exposure frequencies and exposed skin surface
areas used for the residential and recreational scenarios and the soil ingestion rates used for the
recreational and utility worker scenarios
- 4 shy
a Exposure frequencies for residential and recreational scenarios
Some of the exposure frequencies used by EPA to derive the SRBCs for the residential and
recreational scenarios are unrealistic and implausible These include the assumed frequency of
two days per week for seven months per year for low-contact residential exposure to riverbank
soils and sediments and low-contact recreational exposure to floodplain and riverbank soils and
sediments as well as the assumed frequency of three days per week for seven monthsyear for
high-contact residential and recreational exposure to sediments
The Work Plan indicates that the determination of high-contact versus low-contact areas depends
on the accessibility of the area -- eg judgments about the likelihood that the area has physical
features consistent with a child (or adult) contacting riverbank soil on a regular basis (p 6-26)
Physical features that would lead to a low-contact determination would presumably include at a
minimum steep slopes undercut river banks rocky areas with limited or no access to soil or
sediment and marshy areas Such conditions are not consistent with contact with the contaminated
media on a regular basis In these circumstances it is unreasonable to assume an exposure
frequency of two days per week for each week in the seven-month exposure period Such a
frequency clearly represents regular contact which is inconsistent with the limited accessibility
of these areas A more appropriate assumption for low-contact areas would be to assume
occasional contact (eg one day per week or two days per month) during the five warmer months
of the year
For high-contact areas EPA has assumed an exposure frequency of three days per week for seven
months While this is a highly conservative estimate even for contact with floodplain and
riverbank soils it is totally unreasonable for contact with sediments particularly given the
assumed exposed skin surface areas Even for high-contact sediment areas it is extremely
unlikely that children and adults would venture into the river and be exposed to sediments as often
as three days per week during every week from April through October In fact children would
probably not play in the river at all in the cooler portions of this period Thus it would be more
reasonable to assume an average exposure frequency of around one day per week during this
seven-month period
- 5 shy
b Exposed skin surface areas for residential and recreational scenarios
The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface
areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For
these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will
be in contact with soil or sediment every time that the individuals are present in the affected areas
(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin
surface areas during the spring and fall months are likely to be less because individuals will be
wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level
of exposure assumed by EPA may occur during the summer months it would not occur every time
the individual visits the affected area Hence EPA should reduce the exposed skin surface areas
to take account of the lesser areas exposed during the spring and fall and thus develop time-
weighted averages for the exposed skin surface areas
c Soil ingestion rates for recreational scenario
For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion
rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not
agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)
However if these daily soil ingestion rates are to be used EPA should at least apply some factor
to account for the fraction of the daily soil ingestion that is derived from the affected recreational
area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under
real-life conditions total daily soil ingestion will occur over the entire day and will include soil
andor sediments from a variety of different locations some of which will not be contaminated
However EPA has assumed that total daily soil ingestion will come from a single contaminated
area and has not included a factor for the fraction of soil ingested from the site While it is
presumed that this omission is due to the fact that EPA is concerned that some individuals may
occasionally obtain all of their ingested soil from a single contaminated location such an
assumption is not reasonable when coupled with the exposure frequencies that are assumed In
addition some highly accessible areas may be very limited in size as is the case on some
floodplain properties Because of this even if all of the soil is obtained from an exposure area
only a portion of that soil will be contaminated
- 6 shy
Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for
soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil
or sediment ingested daily comes from the site In reality individuals who are engaged in river-
based activities will ingest a combination of soil and sediment during the course of their activity
Given these considerations GE believes that a factor of 50 percent should be used to represent the
fraction of ingested soil or sediment that is derived from contaminated recreational areas during
recreational activities This will correct for the fact that daily ingestion exposure occurs with a
combination of soil and sediment It will also correct for the fact that only limited portions of the
day will be spent in recreational areas In addition such a factor will more accurately reflect
exposures at those areas where the area of contamination is very limited This approach is
consistent with EPAs approach in its development of the cleanup goals under the October 7 1999
Consent Decree for recreational areas outside the River (EPA 1999a)
d Soil ingestion rate for utility worker scenario
EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker
scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion
rate may be appropriate for utility workers who will have direct contact with soil the ingestion
rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of
137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy
contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA
itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for
Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based
on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal
risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility
workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of
08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced
soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion
rate should be used in developing a revised SRBC for the utility worker scenario
- 7 shy
B Comments on Phase 2 - Baseline HHRA Approach
For those properties or other exposure points that remain after the Phase 1 screening the Work
Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a
number of comments on that proposed approach
1 Selection of non-PCB contaminants of potential concern
The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential
concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for
each area that has PCB concentrations above the applicable SRBC the available sampling data on
other constituents will be compared with several criteria to determine the need for additional
sampling in that area These criteria include (a) background concentrations established for the
area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas
which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department
of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts
Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are
exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area
If they are not exceeded the Work Plan states that the baseline HHRA will be performed using
the available analytical data (p 6-31) The latter statement appears to mean that even if the
concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are
below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those
constituents will nevertheless be carried through to the second step in the selection of COPCs
In the second step all constituents detected will be screened for the selection of COPCs by
comparing the maximum detected concentrations with contaminant-specific and medium-specific
risk-based concentrations The Work Plan states that the principal criterion for selection of a
COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the
constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used
it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01
will be used Any constituent for which the maximum concentration exceeds the PRG will be
carried forward for the risk assessment In addition the Work Plan states that constituents which
- 8 shy
are known to bioaccumulate in biological tissues but do not exceed the PROs will be further
evaluated for possible inclusion in the risk assessment
GE has several concerns about this proposed two-step approach
First GE believes that if the concentration of non-PCB constituents in a given area does not
exceed the criteria for further sampling (as listed above) those constituents should be dropped
from further consideration and not carried through to the second step in selecting COPCs As
EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)
which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are
consistent with local background levels or are less than the 1 ppb TEQ concentration for
dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no
significant risk to human health Hence such constituents need not and should not be considered
further in the selection of COPCs
Second for those constituents that are carried forward to the second step in the COPC selection
process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum
concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that
it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy
cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ
of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or
most of which in fact make no significant contribution to actual risks This would undercut the
purpose of the screening approach and will greatly and needlessly complicate the risk assessment
without any corresponding benefit in terms of achieving additional health protection Instead GE
recommends that in selecting the COPCs for the Rest of the River EPA should use an approach
similar to the approach approved by EPA and specified in the SOW for Areas Outside the River
including the non-bank portions of certain floodplain properties This approach utilizes not only
1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria
- 9 shy
the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed
below
For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in
recent guidance from EPA headquarters for use in addressing these constituents in soil at
CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in
residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in
commercialindustrial areas The SOW provides specifically for use of these PRGs in screening
out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these
PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that
these PRGs should be used not only as screening criteria but also as remedial goals which is
consistent with the approach embodied in the SOW for Areas Outside the River See Section B2
below)
For other constituents EPA should follow a step-wise approach that begins with comparisons to
the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for
use in screening such constituents without further reduction they should not be reduced by an
additional 10 times For constituents for which such PRGs do not exist a similar screening step
should be employed using PRGs for surrogate compounds2
EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison
to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB
constituents to site-specific background levels (eg concentrations upstream of the GE facility)
is appropriate to ensure that only those constituents attributable to releases from the GE facility
are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for
Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 2 shy
A Comments on Phase 1 - Site Screening Approach
As discussed in the Work Plan Phase 1 of the HHRA involves a screening of properties and other
exposure points based on available PCB data to determine whether they should be retained for
a more detailed risk assessment To accomplish this screening EPA has developed screening risk-
based concentrations (SRBCs) for PCBs in floodplain soil riverbank soil and sediments for
various exposure scenarios - ie residential recreational agricultural utility worker and
commercialindustrial groundskeeper scenarios Separate lowcontact and highcontact SRBCs
have been calculated for the residential and recreational scenarios The initial screening step
involves comparison of the maximum PCB concentrations for each medium and exposure point
(classified in the appropriate scenario) to the most conservative SRBC applicable to that medium
and exposure point If the maximum concentration is below that SRBC the property or other
exposure point will be dropped from further evaluation If the maximum concentration exceeds
the applicable SRBC EPA will if necessary conduct additional sampling and will determine the
95 upper confidence limit (UCL) on the mean of the PCB data for such property or other
exposure point If the 95 UCL does not exceed the applicable SRBC the property or exposure
point will be dropped from further consideration If the 95 UCL exceeds the SRBC the
property or exposure point will be carried through to Phase 2 of the HHRA
GE supports the concept of a phased screening approach However it believes that several of the
assumptions and other input parameters used by EPA to develop the SRBCs for PCBs are overly
conservative and that as a result the SRBCs are likewise excessively stringent even for a
screening-level analysis In this connection GE preserves all prior positions expressed to EPA
regarding the toxicity of PCBs and appropriate exposure parameters However these comments
are focused on a limited number of the parameters used by EPA in deriving the SRBCs Although
EPA is currently in the process of the Phase 1 screening GE believes that EPA should take
account of these comments and revise its SRBCs accordingly for purposes of conducting additional
or more detailed screening
- 3 shy
1 Inconsistent target risk levels
According to the information provided in Tables 62-1 62-5 and 62-6 of the Work Plan EPA
has not used a consistent target risk level for calculating its SRBCs While a cancer risk of 5 x
106 is identified as the target risk level for the residential and recreational SRBCs target risks of
16 x 10~6 and 11 x 106 are identified for the utility worker and commercial groundskeeper
scenarios respectively The Work Plan presents no rationale for these differences in target risk
levels
To provide a more meaningful screening step EPA should pick a reasonable and consistent target
risk level and use it as the basis for calculating all scenario-specific SRBCs This will ensure that
all SRBCs are equally protective of public health The purpose of conducting the Phase 1 screen
is to identify those properties that do not need to be evaluated further in the HHRA and to drop
them from future consideration so that the HHRA can focus on key areas rather than all areas
If the screening SRBCs are overly stringent numerous properties will be retained for the risk
assessment despite the fact that they will likely be shown to have no unacceptable risks once the
risk assessment is completed
To eliminate this problem and increase the efficiency of the screening step GE believes that the
target cancer risk level should be established as 1 x 10~5 for all scenarios This risk level is the
midpoint of the EPAs acceptable risk range (1 x 10^ to 1 x 106) (EPA 1990) When coupled
with conservative exposure parameters and upper-bound media concentrations as are outlined in
the Work Plan this approach will successfully screen out those properties that do not pose a risk
while retaining those that warrant further evaluation
2 Overly conservative exposure assumptions
Several of the exposure parameter values that are used to derive the SRBCs are extremely
conservative and cannot be considered representative of likely activities even for a screening-level
assessment These parameters include some of the exposure frequencies and exposed skin surface
areas used for the residential and recreational scenarios and the soil ingestion rates used for the
recreational and utility worker scenarios
- 4 shy
a Exposure frequencies for residential and recreational scenarios
Some of the exposure frequencies used by EPA to derive the SRBCs for the residential and
recreational scenarios are unrealistic and implausible These include the assumed frequency of
two days per week for seven months per year for low-contact residential exposure to riverbank
soils and sediments and low-contact recreational exposure to floodplain and riverbank soils and
sediments as well as the assumed frequency of three days per week for seven monthsyear for
high-contact residential and recreational exposure to sediments
The Work Plan indicates that the determination of high-contact versus low-contact areas depends
on the accessibility of the area -- eg judgments about the likelihood that the area has physical
features consistent with a child (or adult) contacting riverbank soil on a regular basis (p 6-26)
Physical features that would lead to a low-contact determination would presumably include at a
minimum steep slopes undercut river banks rocky areas with limited or no access to soil or
sediment and marshy areas Such conditions are not consistent with contact with the contaminated
media on a regular basis In these circumstances it is unreasonable to assume an exposure
frequency of two days per week for each week in the seven-month exposure period Such a
frequency clearly represents regular contact which is inconsistent with the limited accessibility
of these areas A more appropriate assumption for low-contact areas would be to assume
occasional contact (eg one day per week or two days per month) during the five warmer months
of the year
For high-contact areas EPA has assumed an exposure frequency of three days per week for seven
months While this is a highly conservative estimate even for contact with floodplain and
riverbank soils it is totally unreasonable for contact with sediments particularly given the
assumed exposed skin surface areas Even for high-contact sediment areas it is extremely
unlikely that children and adults would venture into the river and be exposed to sediments as often
as three days per week during every week from April through October In fact children would
probably not play in the river at all in the cooler portions of this period Thus it would be more
reasonable to assume an average exposure frequency of around one day per week during this
seven-month period
- 5 shy
b Exposed skin surface areas for residential and recreational scenarios
The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface
areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For
these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will
be in contact with soil or sediment every time that the individuals are present in the affected areas
(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin
surface areas during the spring and fall months are likely to be less because individuals will be
wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level
of exposure assumed by EPA may occur during the summer months it would not occur every time
the individual visits the affected area Hence EPA should reduce the exposed skin surface areas
to take account of the lesser areas exposed during the spring and fall and thus develop time-
weighted averages for the exposed skin surface areas
c Soil ingestion rates for recreational scenario
For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion
rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not
agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)
However if these daily soil ingestion rates are to be used EPA should at least apply some factor
to account for the fraction of the daily soil ingestion that is derived from the affected recreational
area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under
real-life conditions total daily soil ingestion will occur over the entire day and will include soil
andor sediments from a variety of different locations some of which will not be contaminated
However EPA has assumed that total daily soil ingestion will come from a single contaminated
area and has not included a factor for the fraction of soil ingested from the site While it is
presumed that this omission is due to the fact that EPA is concerned that some individuals may
occasionally obtain all of their ingested soil from a single contaminated location such an
assumption is not reasonable when coupled with the exposure frequencies that are assumed In
addition some highly accessible areas may be very limited in size as is the case on some
floodplain properties Because of this even if all of the soil is obtained from an exposure area
only a portion of that soil will be contaminated
- 6 shy
Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for
soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil
or sediment ingested daily comes from the site In reality individuals who are engaged in river-
based activities will ingest a combination of soil and sediment during the course of their activity
Given these considerations GE believes that a factor of 50 percent should be used to represent the
fraction of ingested soil or sediment that is derived from contaminated recreational areas during
recreational activities This will correct for the fact that daily ingestion exposure occurs with a
combination of soil and sediment It will also correct for the fact that only limited portions of the
day will be spent in recreational areas In addition such a factor will more accurately reflect
exposures at those areas where the area of contamination is very limited This approach is
consistent with EPAs approach in its development of the cleanup goals under the October 7 1999
Consent Decree for recreational areas outside the River (EPA 1999a)
d Soil ingestion rate for utility worker scenario
EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker
scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion
rate may be appropriate for utility workers who will have direct contact with soil the ingestion
rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of
137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy
contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA
itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for
Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based
on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal
risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility
workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of
08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced
soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion
rate should be used in developing a revised SRBC for the utility worker scenario
- 7 shy
B Comments on Phase 2 - Baseline HHRA Approach
For those properties or other exposure points that remain after the Phase 1 screening the Work
Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a
number of comments on that proposed approach
1 Selection of non-PCB contaminants of potential concern
The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential
concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for
each area that has PCB concentrations above the applicable SRBC the available sampling data on
other constituents will be compared with several criteria to determine the need for additional
sampling in that area These criteria include (a) background concentrations established for the
area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas
which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department
of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts
Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are
exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area
If they are not exceeded the Work Plan states that the baseline HHRA will be performed using
the available analytical data (p 6-31) The latter statement appears to mean that even if the
concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are
below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those
constituents will nevertheless be carried through to the second step in the selection of COPCs
In the second step all constituents detected will be screened for the selection of COPCs by
comparing the maximum detected concentrations with contaminant-specific and medium-specific
risk-based concentrations The Work Plan states that the principal criterion for selection of a
COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the
constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used
it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01
will be used Any constituent for which the maximum concentration exceeds the PRG will be
carried forward for the risk assessment In addition the Work Plan states that constituents which
- 8 shy
are known to bioaccumulate in biological tissues but do not exceed the PROs will be further
evaluated for possible inclusion in the risk assessment
GE has several concerns about this proposed two-step approach
First GE believes that if the concentration of non-PCB constituents in a given area does not
exceed the criteria for further sampling (as listed above) those constituents should be dropped
from further consideration and not carried through to the second step in selecting COPCs As
EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)
which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are
consistent with local background levels or are less than the 1 ppb TEQ concentration for
dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no
significant risk to human health Hence such constituents need not and should not be considered
further in the selection of COPCs
Second for those constituents that are carried forward to the second step in the COPC selection
process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum
concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that
it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy
cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ
of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or
most of which in fact make no significant contribution to actual risks This would undercut the
purpose of the screening approach and will greatly and needlessly complicate the risk assessment
without any corresponding benefit in terms of achieving additional health protection Instead GE
recommends that in selecting the COPCs for the Rest of the River EPA should use an approach
similar to the approach approved by EPA and specified in the SOW for Areas Outside the River
including the non-bank portions of certain floodplain properties This approach utilizes not only
1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria
- 9 shy
the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed
below
For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in
recent guidance from EPA headquarters for use in addressing these constituents in soil at
CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in
residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in
commercialindustrial areas The SOW provides specifically for use of these PRGs in screening
out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these
PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that
these PRGs should be used not only as screening criteria but also as remedial goals which is
consistent with the approach embodied in the SOW for Areas Outside the River See Section B2
below)
For other constituents EPA should follow a step-wise approach that begins with comparisons to
the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for
use in screening such constituents without further reduction they should not be reduced by an
additional 10 times For constituents for which such PRGs do not exist a similar screening step
should be employed using PRGs for surrogate compounds2
EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison
to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB
constituents to site-specific background levels (eg concentrations upstream of the GE facility)
is appropriate to ensure that only those constituents attributable to releases from the GE facility
are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for
Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 3 shy
1 Inconsistent target risk levels
According to the information provided in Tables 62-1 62-5 and 62-6 of the Work Plan EPA
has not used a consistent target risk level for calculating its SRBCs While a cancer risk of 5 x
106 is identified as the target risk level for the residential and recreational SRBCs target risks of
16 x 10~6 and 11 x 106 are identified for the utility worker and commercial groundskeeper
scenarios respectively The Work Plan presents no rationale for these differences in target risk
levels
To provide a more meaningful screening step EPA should pick a reasonable and consistent target
risk level and use it as the basis for calculating all scenario-specific SRBCs This will ensure that
all SRBCs are equally protective of public health The purpose of conducting the Phase 1 screen
is to identify those properties that do not need to be evaluated further in the HHRA and to drop
them from future consideration so that the HHRA can focus on key areas rather than all areas
If the screening SRBCs are overly stringent numerous properties will be retained for the risk
assessment despite the fact that they will likely be shown to have no unacceptable risks once the
risk assessment is completed
To eliminate this problem and increase the efficiency of the screening step GE believes that the
target cancer risk level should be established as 1 x 10~5 for all scenarios This risk level is the
midpoint of the EPAs acceptable risk range (1 x 10^ to 1 x 106) (EPA 1990) When coupled
with conservative exposure parameters and upper-bound media concentrations as are outlined in
the Work Plan this approach will successfully screen out those properties that do not pose a risk
while retaining those that warrant further evaluation
2 Overly conservative exposure assumptions
Several of the exposure parameter values that are used to derive the SRBCs are extremely
conservative and cannot be considered representative of likely activities even for a screening-level
assessment These parameters include some of the exposure frequencies and exposed skin surface
areas used for the residential and recreational scenarios and the soil ingestion rates used for the
recreational and utility worker scenarios
- 4 shy
a Exposure frequencies for residential and recreational scenarios
Some of the exposure frequencies used by EPA to derive the SRBCs for the residential and
recreational scenarios are unrealistic and implausible These include the assumed frequency of
two days per week for seven months per year for low-contact residential exposure to riverbank
soils and sediments and low-contact recreational exposure to floodplain and riverbank soils and
sediments as well as the assumed frequency of three days per week for seven monthsyear for
high-contact residential and recreational exposure to sediments
The Work Plan indicates that the determination of high-contact versus low-contact areas depends
on the accessibility of the area -- eg judgments about the likelihood that the area has physical
features consistent with a child (or adult) contacting riverbank soil on a regular basis (p 6-26)
Physical features that would lead to a low-contact determination would presumably include at a
minimum steep slopes undercut river banks rocky areas with limited or no access to soil or
sediment and marshy areas Such conditions are not consistent with contact with the contaminated
media on a regular basis In these circumstances it is unreasonable to assume an exposure
frequency of two days per week for each week in the seven-month exposure period Such a
frequency clearly represents regular contact which is inconsistent with the limited accessibility
of these areas A more appropriate assumption for low-contact areas would be to assume
occasional contact (eg one day per week or two days per month) during the five warmer months
of the year
For high-contact areas EPA has assumed an exposure frequency of three days per week for seven
months While this is a highly conservative estimate even for contact with floodplain and
riverbank soils it is totally unreasonable for contact with sediments particularly given the
assumed exposed skin surface areas Even for high-contact sediment areas it is extremely
unlikely that children and adults would venture into the river and be exposed to sediments as often
as three days per week during every week from April through October In fact children would
probably not play in the river at all in the cooler portions of this period Thus it would be more
reasonable to assume an average exposure frequency of around one day per week during this
seven-month period
- 5 shy
b Exposed skin surface areas for residential and recreational scenarios
The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface
areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For
these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will
be in contact with soil or sediment every time that the individuals are present in the affected areas
(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin
surface areas during the spring and fall months are likely to be less because individuals will be
wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level
of exposure assumed by EPA may occur during the summer months it would not occur every time
the individual visits the affected area Hence EPA should reduce the exposed skin surface areas
to take account of the lesser areas exposed during the spring and fall and thus develop time-
weighted averages for the exposed skin surface areas
c Soil ingestion rates for recreational scenario
For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion
rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not
agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)
However if these daily soil ingestion rates are to be used EPA should at least apply some factor
to account for the fraction of the daily soil ingestion that is derived from the affected recreational
area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under
real-life conditions total daily soil ingestion will occur over the entire day and will include soil
andor sediments from a variety of different locations some of which will not be contaminated
However EPA has assumed that total daily soil ingestion will come from a single contaminated
area and has not included a factor for the fraction of soil ingested from the site While it is
presumed that this omission is due to the fact that EPA is concerned that some individuals may
occasionally obtain all of their ingested soil from a single contaminated location such an
assumption is not reasonable when coupled with the exposure frequencies that are assumed In
addition some highly accessible areas may be very limited in size as is the case on some
floodplain properties Because of this even if all of the soil is obtained from an exposure area
only a portion of that soil will be contaminated
- 6 shy
Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for
soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil
or sediment ingested daily comes from the site In reality individuals who are engaged in river-
based activities will ingest a combination of soil and sediment during the course of their activity
Given these considerations GE believes that a factor of 50 percent should be used to represent the
fraction of ingested soil or sediment that is derived from contaminated recreational areas during
recreational activities This will correct for the fact that daily ingestion exposure occurs with a
combination of soil and sediment It will also correct for the fact that only limited portions of the
day will be spent in recreational areas In addition such a factor will more accurately reflect
exposures at those areas where the area of contamination is very limited This approach is
consistent with EPAs approach in its development of the cleanup goals under the October 7 1999
Consent Decree for recreational areas outside the River (EPA 1999a)
d Soil ingestion rate for utility worker scenario
EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker
scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion
rate may be appropriate for utility workers who will have direct contact with soil the ingestion
rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of
137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy
contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA
itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for
Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based
on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal
risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility
workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of
08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced
soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion
rate should be used in developing a revised SRBC for the utility worker scenario
- 7 shy
B Comments on Phase 2 - Baseline HHRA Approach
For those properties or other exposure points that remain after the Phase 1 screening the Work
Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a
number of comments on that proposed approach
1 Selection of non-PCB contaminants of potential concern
The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential
concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for
each area that has PCB concentrations above the applicable SRBC the available sampling data on
other constituents will be compared with several criteria to determine the need for additional
sampling in that area These criteria include (a) background concentrations established for the
area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas
which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department
of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts
Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are
exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area
If they are not exceeded the Work Plan states that the baseline HHRA will be performed using
the available analytical data (p 6-31) The latter statement appears to mean that even if the
concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are
below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those
constituents will nevertheless be carried through to the second step in the selection of COPCs
In the second step all constituents detected will be screened for the selection of COPCs by
comparing the maximum detected concentrations with contaminant-specific and medium-specific
risk-based concentrations The Work Plan states that the principal criterion for selection of a
COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the
constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used
it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01
will be used Any constituent for which the maximum concentration exceeds the PRG will be
carried forward for the risk assessment In addition the Work Plan states that constituents which
- 8 shy
are known to bioaccumulate in biological tissues but do not exceed the PROs will be further
evaluated for possible inclusion in the risk assessment
GE has several concerns about this proposed two-step approach
First GE believes that if the concentration of non-PCB constituents in a given area does not
exceed the criteria for further sampling (as listed above) those constituents should be dropped
from further consideration and not carried through to the second step in selecting COPCs As
EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)
which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are
consistent with local background levels or are less than the 1 ppb TEQ concentration for
dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no
significant risk to human health Hence such constituents need not and should not be considered
further in the selection of COPCs
Second for those constituents that are carried forward to the second step in the COPC selection
process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum
concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that
it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy
cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ
of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or
most of which in fact make no significant contribution to actual risks This would undercut the
purpose of the screening approach and will greatly and needlessly complicate the risk assessment
without any corresponding benefit in terms of achieving additional health protection Instead GE
recommends that in selecting the COPCs for the Rest of the River EPA should use an approach
similar to the approach approved by EPA and specified in the SOW for Areas Outside the River
including the non-bank portions of certain floodplain properties This approach utilizes not only
1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria
- 9 shy
the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed
below
For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in
recent guidance from EPA headquarters for use in addressing these constituents in soil at
CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in
residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in
commercialindustrial areas The SOW provides specifically for use of these PRGs in screening
out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these
PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that
these PRGs should be used not only as screening criteria but also as remedial goals which is
consistent with the approach embodied in the SOW for Areas Outside the River See Section B2
below)
For other constituents EPA should follow a step-wise approach that begins with comparisons to
the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for
use in screening such constituents without further reduction they should not be reduced by an
additional 10 times For constituents for which such PRGs do not exist a similar screening step
should be employed using PRGs for surrogate compounds2
EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison
to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB
constituents to site-specific background levels (eg concentrations upstream of the GE facility)
is appropriate to ensure that only those constituents attributable to releases from the GE facility
are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for
Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 4 shy
a Exposure frequencies for residential and recreational scenarios
Some of the exposure frequencies used by EPA to derive the SRBCs for the residential and
recreational scenarios are unrealistic and implausible These include the assumed frequency of
two days per week for seven months per year for low-contact residential exposure to riverbank
soils and sediments and low-contact recreational exposure to floodplain and riverbank soils and
sediments as well as the assumed frequency of three days per week for seven monthsyear for
high-contact residential and recreational exposure to sediments
The Work Plan indicates that the determination of high-contact versus low-contact areas depends
on the accessibility of the area -- eg judgments about the likelihood that the area has physical
features consistent with a child (or adult) contacting riverbank soil on a regular basis (p 6-26)
Physical features that would lead to a low-contact determination would presumably include at a
minimum steep slopes undercut river banks rocky areas with limited or no access to soil or
sediment and marshy areas Such conditions are not consistent with contact with the contaminated
media on a regular basis In these circumstances it is unreasonable to assume an exposure
frequency of two days per week for each week in the seven-month exposure period Such a
frequency clearly represents regular contact which is inconsistent with the limited accessibility
of these areas A more appropriate assumption for low-contact areas would be to assume
occasional contact (eg one day per week or two days per month) during the five warmer months
of the year
For high-contact areas EPA has assumed an exposure frequency of three days per week for seven
months While this is a highly conservative estimate even for contact with floodplain and
riverbank soils it is totally unreasonable for contact with sediments particularly given the
assumed exposed skin surface areas Even for high-contact sediment areas it is extremely
unlikely that children and adults would venture into the river and be exposed to sediments as often
as three days per week during every week from April through October In fact children would
probably not play in the river at all in the cooler portions of this period Thus it would be more
reasonable to assume an average exposure frequency of around one day per week during this
seven-month period
- 5 shy
b Exposed skin surface areas for residential and recreational scenarios
The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface
areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For
these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will
be in contact with soil or sediment every time that the individuals are present in the affected areas
(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin
surface areas during the spring and fall months are likely to be less because individuals will be
wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level
of exposure assumed by EPA may occur during the summer months it would not occur every time
the individual visits the affected area Hence EPA should reduce the exposed skin surface areas
to take account of the lesser areas exposed during the spring and fall and thus develop time-
weighted averages for the exposed skin surface areas
c Soil ingestion rates for recreational scenario
For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion
rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not
agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)
However if these daily soil ingestion rates are to be used EPA should at least apply some factor
to account for the fraction of the daily soil ingestion that is derived from the affected recreational
area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under
real-life conditions total daily soil ingestion will occur over the entire day and will include soil
andor sediments from a variety of different locations some of which will not be contaminated
However EPA has assumed that total daily soil ingestion will come from a single contaminated
area and has not included a factor for the fraction of soil ingested from the site While it is
presumed that this omission is due to the fact that EPA is concerned that some individuals may
occasionally obtain all of their ingested soil from a single contaminated location such an
assumption is not reasonable when coupled with the exposure frequencies that are assumed In
addition some highly accessible areas may be very limited in size as is the case on some
floodplain properties Because of this even if all of the soil is obtained from an exposure area
only a portion of that soil will be contaminated
- 6 shy
Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for
soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil
or sediment ingested daily comes from the site In reality individuals who are engaged in river-
based activities will ingest a combination of soil and sediment during the course of their activity
Given these considerations GE believes that a factor of 50 percent should be used to represent the
fraction of ingested soil or sediment that is derived from contaminated recreational areas during
recreational activities This will correct for the fact that daily ingestion exposure occurs with a
combination of soil and sediment It will also correct for the fact that only limited portions of the
day will be spent in recreational areas In addition such a factor will more accurately reflect
exposures at those areas where the area of contamination is very limited This approach is
consistent with EPAs approach in its development of the cleanup goals under the October 7 1999
Consent Decree for recreational areas outside the River (EPA 1999a)
d Soil ingestion rate for utility worker scenario
EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker
scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion
rate may be appropriate for utility workers who will have direct contact with soil the ingestion
rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of
137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy
contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA
itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for
Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based
on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal
risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility
workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of
08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced
soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion
rate should be used in developing a revised SRBC for the utility worker scenario
- 7 shy
B Comments on Phase 2 - Baseline HHRA Approach
For those properties or other exposure points that remain after the Phase 1 screening the Work
Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a
number of comments on that proposed approach
1 Selection of non-PCB contaminants of potential concern
The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential
concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for
each area that has PCB concentrations above the applicable SRBC the available sampling data on
other constituents will be compared with several criteria to determine the need for additional
sampling in that area These criteria include (a) background concentrations established for the
area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas
which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department
of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts
Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are
exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area
If they are not exceeded the Work Plan states that the baseline HHRA will be performed using
the available analytical data (p 6-31) The latter statement appears to mean that even if the
concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are
below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those
constituents will nevertheless be carried through to the second step in the selection of COPCs
In the second step all constituents detected will be screened for the selection of COPCs by
comparing the maximum detected concentrations with contaminant-specific and medium-specific
risk-based concentrations The Work Plan states that the principal criterion for selection of a
COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the
constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used
it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01
will be used Any constituent for which the maximum concentration exceeds the PRG will be
carried forward for the risk assessment In addition the Work Plan states that constituents which
- 8 shy
are known to bioaccumulate in biological tissues but do not exceed the PROs will be further
evaluated for possible inclusion in the risk assessment
GE has several concerns about this proposed two-step approach
First GE believes that if the concentration of non-PCB constituents in a given area does not
exceed the criteria for further sampling (as listed above) those constituents should be dropped
from further consideration and not carried through to the second step in selecting COPCs As
EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)
which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are
consistent with local background levels or are less than the 1 ppb TEQ concentration for
dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no
significant risk to human health Hence such constituents need not and should not be considered
further in the selection of COPCs
Second for those constituents that are carried forward to the second step in the COPC selection
process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum
concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that
it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy
cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ
of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or
most of which in fact make no significant contribution to actual risks This would undercut the
purpose of the screening approach and will greatly and needlessly complicate the risk assessment
without any corresponding benefit in terms of achieving additional health protection Instead GE
recommends that in selecting the COPCs for the Rest of the River EPA should use an approach
similar to the approach approved by EPA and specified in the SOW for Areas Outside the River
including the non-bank portions of certain floodplain properties This approach utilizes not only
1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria
- 9 shy
the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed
below
For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in
recent guidance from EPA headquarters for use in addressing these constituents in soil at
CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in
residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in
commercialindustrial areas The SOW provides specifically for use of these PRGs in screening
out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these
PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that
these PRGs should be used not only as screening criteria but also as remedial goals which is
consistent with the approach embodied in the SOW for Areas Outside the River See Section B2
below)
For other constituents EPA should follow a step-wise approach that begins with comparisons to
the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for
use in screening such constituents without further reduction they should not be reduced by an
additional 10 times For constituents for which such PRGs do not exist a similar screening step
should be employed using PRGs for surrogate compounds2
EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison
to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB
constituents to site-specific background levels (eg concentrations upstream of the GE facility)
is appropriate to ensure that only those constituents attributable to releases from the GE facility
are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for
Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 5 shy
b Exposed skin surface areas for residential and recreational scenarios
The excessive conservatism in the exposure frequencies is exacerbated by the exposed skin surface
areas assumed by EPA in deriving the SRBCs for the residential and recreational scenarios For
these scenarios EPA has assumed that the entire head hands forearms lower legs and feet will
be in contact with soil or sediment every time that the individuals are present in the affected areas
(Tables 62-3 and 62-4) This assumption does not take account of the fact that the exposed skin
surface areas during the spring and fall months are likely to be less because individuals will be
wearing heavier clothing mdash ie shoes socks long pants and long sleeved shirts While the level
of exposure assumed by EPA may occur during the summer months it would not occur every time
the individual visits the affected area Hence EPA should reduce the exposed skin surface areas
to take account of the lesser areas exposed during the spring and fall and thus develop time-
weighted averages for the exposed skin surface areas
c Soil ingestion rates for recreational scenario
For recreational exposures (both high-contact and low-contact) EPA has used daily soil ingestion
rates of 200 mgday for small children and 100 mgday for adults (Table 62-2) GE does not
agree with these ingestion rates for the reasons given in its June 3 1999 comments (pp 33-34)
However if these daily soil ingestion rates are to be used EPA should at least apply some factor
to account for the fraction of the daily soil ingestion that is derived from the affected recreational
area as provided in the Agencys Risk Assessment Guidance for Superfund (EPA 1989) Under
real-life conditions total daily soil ingestion will occur over the entire day and will include soil
andor sediments from a variety of different locations some of which will not be contaminated
However EPA has assumed that total daily soil ingestion will come from a single contaminated
area and has not included a factor for the fraction of soil ingested from the site While it is
presumed that this omission is due to the fact that EPA is concerned that some individuals may
occasionally obtain all of their ingested soil from a single contaminated location such an
assumption is not reasonable when coupled with the exposure frequencies that are assumed In
addition some highly accessible areas may be very limited in size as is the case on some
floodplain properties Because of this even if all of the soil is obtained from an exposure area
only a portion of that soil will be contaminated
- 6 shy
Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for
soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil
or sediment ingested daily comes from the site In reality individuals who are engaged in river-
based activities will ingest a combination of soil and sediment during the course of their activity
Given these considerations GE believes that a factor of 50 percent should be used to represent the
fraction of ingested soil or sediment that is derived from contaminated recreational areas during
recreational activities This will correct for the fact that daily ingestion exposure occurs with a
combination of soil and sediment It will also correct for the fact that only limited portions of the
day will be spent in recreational areas In addition such a factor will more accurately reflect
exposures at those areas where the area of contamination is very limited This approach is
consistent with EPAs approach in its development of the cleanup goals under the October 7 1999
Consent Decree for recreational areas outside the River (EPA 1999a)
d Soil ingestion rate for utility worker scenario
EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker
scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion
rate may be appropriate for utility workers who will have direct contact with soil the ingestion
rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of
137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy
contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA
itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for
Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based
on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal
risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility
workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of
08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced
soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion
rate should be used in developing a revised SRBC for the utility worker scenario
- 7 shy
B Comments on Phase 2 - Baseline HHRA Approach
For those properties or other exposure points that remain after the Phase 1 screening the Work
Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a
number of comments on that proposed approach
1 Selection of non-PCB contaminants of potential concern
The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential
concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for
each area that has PCB concentrations above the applicable SRBC the available sampling data on
other constituents will be compared with several criteria to determine the need for additional
sampling in that area These criteria include (a) background concentrations established for the
area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas
which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department
of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts
Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are
exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area
If they are not exceeded the Work Plan states that the baseline HHRA will be performed using
the available analytical data (p 6-31) The latter statement appears to mean that even if the
concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are
below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those
constituents will nevertheless be carried through to the second step in the selection of COPCs
In the second step all constituents detected will be screened for the selection of COPCs by
comparing the maximum detected concentrations with contaminant-specific and medium-specific
risk-based concentrations The Work Plan states that the principal criterion for selection of a
COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the
constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used
it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01
will be used Any constituent for which the maximum concentration exceeds the PRG will be
carried forward for the risk assessment In addition the Work Plan states that constituents which
- 8 shy
are known to bioaccumulate in biological tissues but do not exceed the PROs will be further
evaluated for possible inclusion in the risk assessment
GE has several concerns about this proposed two-step approach
First GE believes that if the concentration of non-PCB constituents in a given area does not
exceed the criteria for further sampling (as listed above) those constituents should be dropped
from further consideration and not carried through to the second step in selecting COPCs As
EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)
which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are
consistent with local background levels or are less than the 1 ppb TEQ concentration for
dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no
significant risk to human health Hence such constituents need not and should not be considered
further in the selection of COPCs
Second for those constituents that are carried forward to the second step in the COPC selection
process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum
concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that
it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy
cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ
of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or
most of which in fact make no significant contribution to actual risks This would undercut the
purpose of the screening approach and will greatly and needlessly complicate the risk assessment
without any corresponding benefit in terms of achieving additional health protection Instead GE
recommends that in selecting the COPCs for the Rest of the River EPA should use an approach
similar to the approach approved by EPA and specified in the SOW for Areas Outside the River
including the non-bank portions of certain floodplain properties This approach utilizes not only
1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria
- 9 shy
the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed
below
For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in
recent guidance from EPA headquarters for use in addressing these constituents in soil at
CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in
residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in
commercialindustrial areas The SOW provides specifically for use of these PRGs in screening
out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these
PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that
these PRGs should be used not only as screening criteria but also as remedial goals which is
consistent with the approach embodied in the SOW for Areas Outside the River See Section B2
below)
For other constituents EPA should follow a step-wise approach that begins with comparisons to
the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for
use in screening such constituents without further reduction they should not be reduced by an
additional 10 times For constituents for which such PRGs do not exist a similar screening step
should be employed using PRGs for surrogate compounds2
EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison
to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB
constituents to site-specific background levels (eg concentrations upstream of the GE facility)
is appropriate to ensure that only those constituents attributable to releases from the GE facility
are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for
Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 6 shy
Moreover EPAs assumption is excessively conservative when one considers that the SRBCs for
soil and sediment are separately derived and that each SRBC assumes that 100 percent of the soil
or sediment ingested daily comes from the site In reality individuals who are engaged in river-
based activities will ingest a combination of soil and sediment during the course of their activity
Given these considerations GE believes that a factor of 50 percent should be used to represent the
fraction of ingested soil or sediment that is derived from contaminated recreational areas during
recreational activities This will correct for the fact that daily ingestion exposure occurs with a
combination of soil and sediment It will also correct for the fact that only limited portions of the
day will be spent in recreational areas In addition such a factor will more accurately reflect
exposures at those areas where the area of contamination is very limited This approach is
consistent with EPAs approach in its development of the cleanup goals under the October 7 1999
Consent Decree for recreational areas outside the River (EPA 1999a)
d Soil ingestion rate for utility worker scenario
EPA has used a soil ingestion rate of 200 mgday to develop the SRBC for the utility worker
scenario because of likely heavy exposure to soils (p 6-17) While an enhanced soil ingestion
rate may be appropriate for utility workers who will have direct contact with soil the ingestion
rate of 200 mgday is too high GE previously proposed use of an enhanced soil ingestion rate of
137 mgday for farmers at the Housatonic River site who are likewise expected to have heavy
contact with soil (GE 1997) and EPA approved that proposal (EPA and MDEP 1998) EPA
itself used that soil ingestion rate in deriving a cleanup level for the utility worker scenario for
Areas Outside the River under the Consent Decree (EPA 1999a) This ingestion rate was based
on an assumed upper-bound soil adherence rate of 1 mgcm2for the hands In recent draft dermal
risk assessment guidance however EPA (1999b) recommends soil adherence factors for utility
workers ranging from a central-tendency estimate of 02 mgcm2 to an upper-bound estimate of
08 mgcm2 Use of that upper-bound adherence factor of 08 mgcm2 would result in an enhanced
soil ingestion rate for utility workers of 110 mgday GE submits that this enhanced soil ingestion
rate should be used in developing a revised SRBC for the utility worker scenario
- 7 shy
B Comments on Phase 2 - Baseline HHRA Approach
For those properties or other exposure points that remain after the Phase 1 screening the Work
Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a
number of comments on that proposed approach
1 Selection of non-PCB contaminants of potential concern
The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential
concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for
each area that has PCB concentrations above the applicable SRBC the available sampling data on
other constituents will be compared with several criteria to determine the need for additional
sampling in that area These criteria include (a) background concentrations established for the
area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas
which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department
of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts
Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are
exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area
If they are not exceeded the Work Plan states that the baseline HHRA will be performed using
the available analytical data (p 6-31) The latter statement appears to mean that even if the
concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are
below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those
constituents will nevertheless be carried through to the second step in the selection of COPCs
In the second step all constituents detected will be screened for the selection of COPCs by
comparing the maximum detected concentrations with contaminant-specific and medium-specific
risk-based concentrations The Work Plan states that the principal criterion for selection of a
COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the
constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used
it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01
will be used Any constituent for which the maximum concentration exceeds the PRG will be
carried forward for the risk assessment In addition the Work Plan states that constituents which
- 8 shy
are known to bioaccumulate in biological tissues but do not exceed the PROs will be further
evaluated for possible inclusion in the risk assessment
GE has several concerns about this proposed two-step approach
First GE believes that if the concentration of non-PCB constituents in a given area does not
exceed the criteria for further sampling (as listed above) those constituents should be dropped
from further consideration and not carried through to the second step in selecting COPCs As
EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)
which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are
consistent with local background levels or are less than the 1 ppb TEQ concentration for
dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no
significant risk to human health Hence such constituents need not and should not be considered
further in the selection of COPCs
Second for those constituents that are carried forward to the second step in the COPC selection
process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum
concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that
it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy
cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ
of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or
most of which in fact make no significant contribution to actual risks This would undercut the
purpose of the screening approach and will greatly and needlessly complicate the risk assessment
without any corresponding benefit in terms of achieving additional health protection Instead GE
recommends that in selecting the COPCs for the Rest of the River EPA should use an approach
similar to the approach approved by EPA and specified in the SOW for Areas Outside the River
including the non-bank portions of certain floodplain properties This approach utilizes not only
1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria
- 9 shy
the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed
below
For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in
recent guidance from EPA headquarters for use in addressing these constituents in soil at
CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in
residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in
commercialindustrial areas The SOW provides specifically for use of these PRGs in screening
out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these
PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that
these PRGs should be used not only as screening criteria but also as remedial goals which is
consistent with the approach embodied in the SOW for Areas Outside the River See Section B2
below)
For other constituents EPA should follow a step-wise approach that begins with comparisons to
the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for
use in screening such constituents without further reduction they should not be reduced by an
additional 10 times For constituents for which such PRGs do not exist a similar screening step
should be employed using PRGs for surrogate compounds2
EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison
to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB
constituents to site-specific background levels (eg concentrations upstream of the GE facility)
is appropriate to ensure that only those constituents attributable to releases from the GE facility
are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for
Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 7 shy
B Comments on Phase 2 - Baseline HHRA Approach
For those properties or other exposure points that remain after the Phase 1 screening the Work
Plan outlines the approach that EPA plans to use in conducting a baseline HHRA GE has a
number of comments on that proposed approach
1 Selection of non-PCB contaminants of potential concern
The Work Plan outlines a two-step approach for selecting non-PCB contaminants of potential
concern (COPCs) to be included in the baseline HHRA (Section 6323) In the first step for
each area that has PCB concentrations above the applicable SRBC the available sampling data on
other constituents will be compared with several criteria to determine the need for additional
sampling in that area These criteria include (a) background concentrations established for the
area (b) EPAs Preliminary Remediation Goal (PRG) for dioxinsfurans in residential areas
which is a 1 ppb Toxicity Equivalent (TEQ) concentration and (c) the Massachusetts Department
of Environmental Protections (MDEPs) Method 1 soil standards set out in the Massachusetts
Contingency Plan (MCP) (see Work Plan p 6-30) If these or other (unspecified) criteria are
exceeded EPA will conduct additional characterization sampling (or other evaluation) in the area
If they are not exceeded the Work Plan states that the baseline HHRA will be performed using
the available analytical data (p 6-31) The latter statement appears to mean that even if the
concentrations of non-PCB constituents do not exceed the criteria for further sampling (ie are
below background concentrations the dioxinfuran PRG or the Method 1 soil standards) those
constituents will nevertheless be carried through to the second step in the selection of COPCs
In the second step all constituents detected will be screened for the selection of COPCs by
comparing the maximum detected concentrations with contaminant-specific and medium-specific
risk-based concentrations The Work Plan states that the principal criterion for selection of a
COPC will be an exceedance of the medium-specific PRG by the maximum concentration of the
constituent (p 6-31) Although the Work Plan does not give the source of the PRGs to be used
it states that a target cancer risk of 1 x 10~6 and a target non-cancer hazard quotient (HQ) of 01
will be used Any constituent for which the maximum concentration exceeds the PRG will be
carried forward for the risk assessment In addition the Work Plan states that constituents which
- 8 shy
are known to bioaccumulate in biological tissues but do not exceed the PROs will be further
evaluated for possible inclusion in the risk assessment
GE has several concerns about this proposed two-step approach
First GE believes that if the concentration of non-PCB constituents in a given area does not
exceed the criteria for further sampling (as listed above) those constituents should be dropped
from further consideration and not carried through to the second step in selecting COPCs As
EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)
which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are
consistent with local background levels or are less than the 1 ppb TEQ concentration for
dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no
significant risk to human health Hence such constituents need not and should not be considered
further in the selection of COPCs
Second for those constituents that are carried forward to the second step in the COPC selection
process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum
concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that
it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy
cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ
of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or
most of which in fact make no significant contribution to actual risks This would undercut the
purpose of the screening approach and will greatly and needlessly complicate the risk assessment
without any corresponding benefit in terms of achieving additional health protection Instead GE
recommends that in selecting the COPCs for the Rest of the River EPA should use an approach
similar to the approach approved by EPA and specified in the SOW for Areas Outside the River
including the non-bank portions of certain floodplain properties This approach utilizes not only
1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria
- 9 shy
the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed
below
For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in
recent guidance from EPA headquarters for use in addressing these constituents in soil at
CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in
residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in
commercialindustrial areas The SOW provides specifically for use of these PRGs in screening
out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these
PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that
these PRGs should be used not only as screening criteria but also as remedial goals which is
consistent with the approach embodied in the SOW for Areas Outside the River See Section B2
below)
For other constituents EPA should follow a step-wise approach that begins with comparisons to
the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for
use in screening such constituents without further reduction they should not be reduced by an
additional 10 times For constituents for which such PRGs do not exist a similar screening step
should be employed using PRGs for surrogate compounds2
EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison
to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB
constituents to site-specific background levels (eg concentrations upstream of the GE facility)
is appropriate to ensure that only those constituents attributable to releases from the GE facility
are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for
Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 8 shy
are known to bioaccumulate in biological tissues but do not exceed the PROs will be further
evaluated for possible inclusion in the risk assessment
GE has several concerns about this proposed two-step approach
First GE believes that if the concentration of non-PCB constituents in a given area does not
exceed the criteria for further sampling (as listed above) those constituents should be dropped
from further consideration and not carried through to the second step in selecting COPCs As
EPA has recognized in the Statement of Work for Removal Actions Outside the River (SOW)
which is Appendix E to the Consent Decree concentrations of non-PCB constituents that are
consistent with local background levels or are less than the 1 ppb TEQ concentration for
dioxinsfurans or are below the applicable Method 1 soil standards are considered to pose no
significant risk to human health Hence such constituents need not and should not be considered
further in the selection of COPCs
Second for those constituents that are carried forward to the second step in the COPC selection
process EPAs proposed primary approach for selecting COPCs mdash ie comparison of maximum
concentrations to PRGs - is overly restrictive In recent discussions with GE EPA explained that
it is currently planning to use the EPA Region 9 PRGs for this purpose but to reduce the nonshy
cancer-based PRGs by 10 times due to the fact that those PRGs were based on a non-cancer HQ
of 10 not 01 This approach will likely lead to the selection of a long list of COPCs many or
most of which in fact make no significant contribution to actual risks This would undercut the
purpose of the screening approach and will greatly and needlessly complicate the risk assessment
without any corresponding benefit in terms of achieving additional health protection Instead GE
recommends that in selecting the COPCs for the Rest of the River EPA should use an approach
similar to the approach approved by EPA and specified in the SOW for Areas Outside the River
including the non-bank portions of certain floodplain properties This approach utilizes not only
1 The Work Plan also states that in addition to these specified criteria EPA will also use other applicable criteria in the first step of this process but it does not say what those additional criteria are EPA should specify those criteria
- 9 shy
the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed
below
For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in
recent guidance from EPA headquarters for use in addressing these constituents in soil at
CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in
residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in
commercialindustrial areas The SOW provides specifically for use of these PRGs in screening
out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these
PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that
these PRGs should be used not only as screening criteria but also as remedial goals which is
consistent with the approach embodied in the SOW for Areas Outside the River See Section B2
below)
For other constituents EPA should follow a step-wise approach that begins with comparisons to
the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for
use in screening such constituents without further reduction they should not be reduced by an
additional 10 times For constituents for which such PRGs do not exist a similar screening step
should be employed using PRGs for surrogate compounds2
EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison
to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB
constituents to site-specific background levels (eg concentrations upstream of the GE facility)
is appropriate to ensure that only those constituents attributable to releases from the GE facility
are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for
Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 9 shy
the Region 9 PRGs but also the other criteria listed on page 6-30 of the Work Plan as discussed
below
For dioxins and furans EPA should use the PRGs for dioxinfuran TEQs that are prescribed in
recent guidance from EPA headquarters for use in addressing these constituents in soil at
CERCLA and RCRA sites (EPA 1998) Those PRGs are 1 ppb for dioxinfuran TEQs in
residential areas (also applicable to recreational areas) and 5-20 ppb for dioxinfuran TEQs in
commercialindustrial areas The SOW provides specifically for use of these PRGs in screening
out dioxinfuran compounds in soil in Areas Outside the River EPA should likewise use these
PRGs in selecting COPCs for the Rest of the River (Indeed as discussed below GE believes that
these PRGs should be used not only as screening criteria but also as remedial goals which is
consistent with the approach embodied in the SOW for Areas Outside the River See Section B2
below)
For other constituents EPA should follow a step-wise approach that begins with comparisons to
the Region 9 PRGs Since these PRGs are highly conservative and are specified in the SOW for
use in screening such constituents without further reduction they should not be reduced by an
additional 10 times For constituents for which such PRGs do not exist a similar screening step
should be employed using PRGs for surrogate compounds2
EPA should then apply the other criteria listed on page 6-30 of the Work Plan mdash ie comparison
to background levels and MCP Method 1 standards Comparison of concentrations of non-PCB
constituents to site-specific background levels (eg concentrations upstream of the GE facility)
is appropriate to ensure that only those constituents attributable to releases from the GE facility
are included in the risk assessment This step is allowed in EPAs Risk Assessment Guidance for
Generally when toxicity criteria have not been derived for particular constituents it is because there are no reliable toxicity data on which to base them If such constituents are not screened but are simply carried through to the risk assessment as COPCs that would unnecessarily complicate the risk assessment and would result in largely meaningless or highly uncertain estimates given the lack of reliable toxicity data and criteria Hence EPA should screen these compounds using PRGs for surrogate compounds as described in GEs June 3 1999 comments (p 6)
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 10 -
Superfund (EPA 1989) and has been accepted by EPA as a basis for selecting non-PCB COPCs
for both soils and sediments in the Upper Vi Mile Reach of the Housatonic River (as stated in the
Upper 2 Mile Reach Work Plan) and for Areas Outside the River (as provided in the SOW) It
is also consistent with the MCP which provides that if the concentration of a constituent is at or
below background levels then that constituent will be considered to pose No Significant Risk (310
CMR 400902(3)) Similarly comparison to the MCP Method 1 soil standards is appropriate
because those standards have been developed as conservative risk-based standards (using a target
cancer risk of 1 x 106) that will ensure protection of human health and the environment Again
EPA has already accepted comparisons to those standards as a basis for selecting non-PCB COPCs
in soil for the Upper a Mile Reach of the River and Areas Outside the River Moreover in its
Engineering EvaluationCost Analysis (EECA) for the l2 Mile Reach of the River EPA itself
has used the MCP Method 1 soil standards as well as background levels as a basis for evaluating
non-PCB constituents in bank soils (Weston 2000b) Thus use of comparisons to background
levels and MCP Method 1 standards in selecting COPCs for the Rest of the River is justified by
considerable precedent and will ensure that a consistent approach is employed to address such
constituents throughout the GE-PittsfieldHousatonic River Site EPA has presented no sound
basis for departing from that approach for the Rest of the River portion of the Site
Moreover use solely of the maximum detected concentrations in selecting COPCs from these
constituents is overly conservative and inappropriate For example comparisons to site-specific
background levels should be made using an appropriate statistical technique or summary statistics
as described in MDEP (1995) guidance while comparisons to the MCP Method 1 standards are
generally to be made based on average exposure point concentrations (310 CMR 400926(3))
Again these approaches have been agreed upon by EPA as fully protective for Areas Outside the
River as set forth in the SOW and should likewise be used for the Rest of the River to eliminate
constituents that are not likely to contribute to Site-related risks
Finally the Work Plan does not explain how the potential for bioaccumulation (discussed on p
6-31) will be considered or what criteria will be used to examine this Unless the approach used
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 11 shy
to identify and select such COPCs is clearly defined it could result in the unnecessary inclusion
of certain constituents EPA should provide supplemental information that outlines the specific
approach and criteria that will be used to identify and select these constituents for inclusion
2 Risk assessment of dioxinsfurans
The Work Plan states that EPA intends to evaluate the potential risks associated with dioxins and
furans by converting them to TEQs (toxic equivalents of 2378-TCDD) using Toxicity
Equivalency Factors (TEFs) and then applying the provisional cancer slope factor (CSF) of
150000 (mgkg-day)- listed on HEAST for 2378-TCDD (Section 63324) As discussed in
detail in GEs June 3 1999 comments (pp 19-22) the TEF approach has substantial limitations
and the provisional CSF of 150000 (mgkg-day)1 is uncertain and unsupportable A wide range
of CSFs spanning from 9000 to 156000 (mgkg-day)1 have been proposed for TCDD under
the linear nonthreshold cancer model with differences resulting from the selection of tumor
classification scheme the interspecies sealing factor used and the model used for low-dose
extrapolation Thus there is enormous uncertainty associated with any selected CSF
In these circumstances GE believes that to the extent possible in this risk assessment EPA
should evaluate dioxins and furans through application of the dioxinfuran TEQ PRGs set forth
in EPA (1998) guidance and discussed above using those PRGs as remedial goals as well as
screening levels rather than conducting forward risk calculations for these compounds This
approach would avoid the uncertainties associated with selecting a particular CSF for TCDD and
is consistent with the approach adopted by EPA for Areas Outside the River as described in the
SOW
However to the extent that EPA includes dioxins and furans in forward risk calculations GE
believes that EPA should use a CSF of 30000 (mgkg-day) As shown in our June 3 1999
comments based on current information that CSF is more supportable than the CSF of 150000
(mgkg-day)1 specified in the Work Plan
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 12 shy
3 Use of TEFTEQ approach for dioxin-like PCBs
In the Work Plan EPA also proposes to use the TEFTEQ approach for estimating risks of dioxin-
like PCB congeners (Section 63325) Under this approach the concentrations of these PCB
congeners are converted to toxic equivalents (TEQs) of 2378-TCDD using TEFs and the
provisional CSF of 150000 (mgkg-day)1 for 2378-TCDD is then applied to estimate the risks
In its June 3 1999 comments GE presented several reasons why this approach is not appropriate
or scientifically defensible In discussions with GE in November 1999 EPA stated that since it
will make risk management decisions on the basis of total PCB concentrations rather than
congener-specific data it intends to use this TEFTEQ approach for PCBs only if it finds a good
correlation between the concentrations of the dioxin-like congeners and concentrations of total
PCBs There is no indication in the Work Plan however that EPA has established a good
correlation between the congener-specific data and the total PCB data Indeed in more recent
discussions EPA has indicated that it has not to date evaluated this issue If and when EPA makes
such an analysis GE requests the opportunity to review the analysis and discuss it with EPA The
absence of such a correlation would itself make the TEQ data irrelevant to risk management
decisions
In any event GE continues to believe for the same reasons given in its June 3 1999 comments
(pp 23-25) that application of the TEFTEQ approach to PCBs is not scientifically supportable
These reasons include (a) the lack of sufficient congener-specific data to provide reliable
estimates of true congener concentrations (b) the double counting of carcinogenic risks that would
result from use of both the TCDD CSF and the PCB CSF in the same assessment and (c) the
uncertainties and unsupportability of the TEFs for PCB congeners and of the CSF (for 2378shy
TCDD) to be used to evaluate the resulting TEQs
4 Failure to consider alternative CSFs and RfDs for PCBs
The Work Plan states that EPA intends to use the upper-bound CSF for PCBs of 2 (mgkg-day)
and the Reference Dose (RfD) for Aroclor 1254 of 2 x 105 mgkg-day as published in its
Integrated Risk Information System (IRIS) (Sections 63323 and 63332) In its June 3 1999
comments (pp 16-19 and Appendix A) GE presented substantial reasons why these toxicity values
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 13 shy
overestimate the carcinogenic potential and non-carcinogenic impacts of PCBs and it
recommended the development of alternative toxicity values for PCBs EPA has made no
reference to this presentation in its Work Plan
This failure to consider the alternative toxicological information presented by GE is unlawful
Under a settlement agreement with GE (Settlement Agreement in General Electric Company v
Browner No 93-1251 DC Cir October 25 1993) and a subsequent EPA guidance
memorandum (EPA 1993) if an outside party questions the use of IRIS values during the course
of an EPA risk assessment and presents alternative toxicological information that may be used in
place of the IRIS values EPA has an obligation to consider all credible and relevant evidence
before it There is no indication that EPA has considered the alternative toxicological
information presented by GE Rather it has ignored GEs showing and chosen simply to defer
to the current IRIS values (which is precisely the course of conduct which the settlement and
guidance prohibit) EPA should rectify this deficiency by now giving careful consideration to the
alternative toxicological information presented by GE
5 Use of subchronic RfD for PCBs
The Work Plan states that EPA will consider using the subchronic RfD for PCBs only if it
determines that exposure is intermittent enough to result in an exposure period that is substantially
reduced (p 6-42) Later on page 6-43 the Work Plan states that this will occur when the total
exposure period is substantially below the 7-year period that is considered by EPA to represent
chronic exposure There is no further discussion of this issue so it is not clear what the Agency
will use as a benchmark to identify intermittent exposure
Many of the proposed exposure scenarios are likely to be extremely intermittent in nature For
example in developing the SRBC for utility workers EPA assumes that exposure occurs 5
daysyear for a period of 25 years Clearly 5 daysyear is intermittent exposure In fact even
when considered over the entire 25-year exposure period the total exposure consists of 125 days
which amounts to 13 of a year or less than 5 percent of the 7-year chronic exposure period This
exposure would certainly be considered intermittent despite the fact that the total exposure period
lasts for 25 years
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 14 shy
As a second example in developing its SRBCs for recreational exposures EPA has assumed that
children aged 1-6 are exposed for six years during recreational activities Even using the highly
conservative exposure frequency for high contact areas of 3 daysweek for a period of 7 months
per year (a total of 84 daysyear or 504 days for the entire exposure period) the number of days
exposed is equivalent to 14 years or less than 20 percent of the available days during the 7-year
chronic exposure period Nevertheless as stated on page 6-42 EPA does not intend to use the
subchronic RfD to evaluate noncancer hazards to young children
In short the Work Plan states that EPA will use the subchronic RfD when exposures during a
given scenario are sufficiently intermittent to result in a total exposure that is far less than the 7shy
year chronic exposure period identified by EPA It is clear however that even for scenarios
where this intermittent exposure is likely EPA does not intend to incorporate the subchronic RfD
EPA should reconsider its position on the use of the subchronic RfD and incorporate it into those
scenarios where it is relevant
6 Inappropriate use of a single exposure scenario for all direct-contact recreational activities
The Work Plan provides that while consumption of fish and game will be evaluated separately
all direct-contact recreational activities will be evaluated using a single exposure scenario and a
single set of exposure assumptions (pp 6-52 - 6-53) These activities include hiking walking
canoeing picnicking swimming wading and dirt biking as well as fishing and hunting (but not
consumption of fish or game) In its June 3 1999 comments (pp 11-12 29-30) GE explained
that use of a single exposure scenario and set of assumptions for such diverse activities is
inappropriate because it will force EPA either to use the most conservative assumptions for any
of these activities (which would clearly result in grossly overestimated exposure estimates) or to
pick compromise parameters that will not be truly representative of any of these activities (which
would be subject to considerable uncertainty) Based on discussions with EPA in the fall of 1999
GE had understood that the Agency recognized this problem and intended to consider further
whether and how to divide this single direct-contact recreational scenario into a number of more
specific scenarios (eg hikingwalking canoeing picnicking swimmingwading) However the
Work Plan does not do that GE again urges EPA to develop such specific scenarios so as to take
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 15 shy
account of the significant differences in exposure parameters among these types of activities In
doing so actual recreational exposures will be more closely approximated and more realistic and
appropriate cleanup goals can be developed
7 Lack of information about specific exposure assumptions to be used
As in the prior draft the Work Plan does not present the specific exposure parameter values and
equations to be used in EPAs risk assessment but simply states that they will be developed based
on EPA and MDEP guidance along with site-specific data (Section 6345) GE requested such
information in its June 3 1999 comments and in prior meetings with EPA In response EPA has
advised GE that the Agencys risk assessment staff will hold a series of technical meetings with
GE representatives and consultants and at such meetings will be prepared to discuss the most
appropriate exposure assumptions and parameter values to be used GE reiterates its request for
such meetings and looks forward to discussing these issues with EPA
8 Use of probabilistic risk analysis
In the Uncertainty Analysis section of the Work Plan (Section 636) EPA states that
probabilistic approaches such as Monte Carlo analysis will be considered for those COPCs and
exposure pathways that result in significant risk estimates However it does not mention
probabilistic risk assessment techniques in connection with any of the main risk assessments In
prior discussions with GE EPA indicated that one of the principal reasons why it collected a large
amount of fish tissue PCB data is for use in a probabilistic risk assessment of fish consumption
As discussed in our June 3 1999 comments (pp 38-40) GE supports the use of a probabilistic
exposure model in the main risk analysis of the fish consumption pathway However the Work
Plan presents no discussion of or plans for conducting such an assessment In fact EPA does not
even cite any of its Monte Carlo guidance documents in Table 61-1 which lists the risk
assessment guidance documents and information sources that EPA intends to use in conducting the
HHRA
During the last few years EPAs policies and guidelines have focused on improving risk
management by presenting decision-makers with the entire range of possible risks rather than a
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 16 shy
single point estimate (EPA 1992ab 1995ab 1997ab 1999c) The new policy states that
numerical risk assessments should be accompanied by a full characterization of the uncertainties
limitations and assumptions in the risk assessment The use of two-dimensional (2-D)
probabilistic analysis such as Microexposure analysis can more effectively characterize the
impact of variability or uncertainty in input parameters on the estimates of dose rates in an exposed
population by considering time-dependent changes
Microexposure analysis has been used by EPA and by independent researchers to simulate duration
of residential exposure (EPA 1992c Johnson and Capel 1992 Sielken 1994) It also has been
used to evaluate childhood exposures to lead (Goodrum et al 1994) exposure to contaminants
in tap water (Harrington et al 1995) and exposure to dioxins from the consumption of freshwater
fish (Keenan et al 1993ab 1995 1996a 1997ab) Microexposure analysis was employed in
the supplemental risk assessment for the Stringfellow Superfund site in California (Pyrite Canyon
Group 1994 Keenan et al 1996b) In addition Microexposure analysis has been described in
EPAs Risk Assessment Guidance for Superfund Volume 3 Part A mdash External Review Draft as a
viable alternative for modeling time-dependent variability in concentrations daily activity patterns
and other behavioral exposure factors (EPA 1999c)
Recently Dr Ted Simon of EPA published a systematic comparison of the various methods of
probabilistic risk analysis through application to the fish ingestion pathway at the
SangamoWestonLake Hartwell Superfund site (Simon 1999) Simon (1999) found that the
Microexposure event model gave risk estimates that were approximately an order of magnitude
lower than those derived using deterministic and other probabilistic approaches He attributed this
decrease to the power of the method to incorporate changes over time
GE again urges EPA to use a probabilistic exposure model specifically the Microexposure model
to estimate exposures via the fish consumption pathway Such an approach will allow EPA to
evaluate the variations in exposures over time and among anglers taking into consideration site-
specific factors as well as the uncertainties associated with the input parameters As outlined in
Appendix E of its draft guidance for conducting probabilistic risk assessments (EPA 1999c) some
types of exposures including fish consumption can be expected to vary over time Anglers may
change their activity levels and their fish consumption behaviors as they age In addition different
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 17 shy
species and sizes of fish may be consumed from a variety of locations on the river over time and
may be prepared using different methods Thus fish tissue concentrations will not be constant
among anglers or throughout the exposure period for a single angler A 2-D Microexposure
analysis will allow EPA to derive more complete characterizations of risk and uncertainty than can
be developed using point estimates and will provide information that will allow better informed
risk management decisions to be made (EPA 1999c)
9 Use of 95 UCLs or maximum values as exposure point concentrations
The Work Plan states that the exposure point concentration (EPC) for each property or other
exposure area will be based on the 95 UCL of the site data or the maximum detected
concentration whichever is lower (Section 6344) The Work Plan also describes the statistical
techniques to be used to calculate the 95 UCL including use of Lands H-statistic for
lognormally distributed data (p 6-32)
GEs comments on this approach are set forth in detail in Attachment A which was recently
provided to EPA As discussed in that attachment and demonstrated by EPA contractors and
consultants (EPA 1997c Schulz and Griffin 1999) use of the H-statistic to derive a 95 UCL
can and frequently does produce large overestimates of the true mean both when there are slight
deviations from the assumed lognormal distribution of the data and even when the data are
obtained from a lognormal population Hence that approach should not be used
Instead GE urges EPA to use for the Rest of the River the same PCB averaging technique that
EPA approved and specified in the October 1999 Consent Decree for the banks of the Upper 12
Mile Reach and for Areas Outside the River mdash a spatial averaging approach That approach will
(a) allow calculation of EPCs that are much more truly representative of actual exposures than use
of EPAs currently proposed approach (b) allow evaluation of the Rest of the River in a manner
that is consistent with the evaluation of the Upper li Mile Reach and Areas Outside the River and
(c) likely result in less stringent (but still conservative) cleanup goals and thus minimize the extent
of unnecessary removal or other remediation that will be required for the Rest of the River GE
urges EPA to conduct additional sampling if necessary to ensure that sufficient data will be
available to calculate reliable spatial averages for each exposure area
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 18 shy
However to the extent that EPA rejects that approach and insists on estimating a 95 UCL it
should still not use the H-statistic approach proposed in the Work Plan As discussed in
Attachment A and demonstrated by Singh et al (EPA 1997c) and Schulz and Griffin (1999) there
are other methods of calculating a 95 UCL of the arithmetic mean including bootstrapping that
are more reliable and representative To the extent that EPA is unwilling to adopt the spatial
averaging approach for the Rest of the River risk assessment GE recommends that EPA employ
a spatial bootstrapping approach and use the 95 UCL of the spatial bootstrap of the data to
estimate the EPC for each exposure area This approach which is discussed in detail in
Attachment A is consistent with EPA guidance provides more reliable estimates of the 95
UCLs considers the location and density of each sample in each exposure area mitigates the
statistical impact of a small database and can be used regardless of the distribution of the sampling
data
References
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Pan A) -Interim Final US Environmental Protection Agency Office of
Emergency and Remedial Response Washington DC EPA5401-89-002 July
EPA 1990 National Contingency Plan US Environmental Protection Agency
Washington DC
EPA 1992a Final Guidelines for Exposure Assessment Notice US Environmental Protection
Agency Washington DC 57 Federal Register 14 22888-22938 May 29
EPA 1992b Guidance on Risk Characterization for Risk Managers and Risk Assessors
Memorandum from F Henry Habicht III Deputy Administrator to Assistant Administrators
and Regional Administrators Office of the Administrator Washington DC
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 19 shy
EPA 1992c A Monte Carlo Approach to Simulating Residential Occupancy Periods and Its
Application to the General US Population US Environmental Protection Agency Office of
Air Quality Planning and Standards Research Triangle Park NC EPA-4503-92-011
August
EPA 1993 Memo from WH Farland and HL Longest to Regional Directors Re Use of
IRIS values in Superfund risk assessments US Environmental Protection Agency
Washington DC OSWER Directive 92857-16 December 21
EPA 1995a Policy for Risk Characterization at the US Environmental Protection Agency
US Environmental Protection Agency Office of the Administrator Washington DC
March
EPA 1995b Guidance for Risk Characterization US Environmental Protection Agency
Science Policy Council Washington DC February
EPA 1997a Policy for Use of Probabilistic Analysis in Risk Assessment at the US
Environmental Protection Agency US Environmental Protection Agency Office of Research
and Development Washington DC May 15
EPA 1997b Guiding Principles for Monte Carlo Analysis US Environmental Protection
Agency Risk Assessment Forum Washington DC EPA630R-97001 March 35 p
EPA 1997c The Lognormal Distribution in Environmental Applications Prepared for EPA
by A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1998 Memo from Timothy Fields Acting Administrator to Regional Directors Re
Approach for addressing dioxin in soil at CERCLA and RCRA sites US Environmental
Protection Agency Washington DC OSWER Directive 92004-26 April 13
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 20 shy
EPA and MDEP 1998 Letter from Anna Symington Acting Section Chief Massachusetts
DEP and Bryan Olson Project Manager US EPA to Jane Magee General Electric
Company Re Pittsfield 1-0147 EPA Area 6 Housatonic River Human Health Risk
Assessment January 29
EPA 1999a Memorandum from Ann-Marie Burke Toxicologist to Richard Cavagnero GE
Project Leader Re Protectiveness of Cleanup Levels for Removal Actions Outside the River
- Protection of Human Health Attachment A to Appendix D to Consent Decree lodged in
United States et al v General Electric Company (No 99-30225-MAP US Dist Court for
Dist of Mass) August 4
EPA 1999b Risk Assessment Guidance for Superfund Volume I - Human Health Evaluation
Manual Supplemental Guidance Dermal Risk Assessment Interim Guidance US
Environmental Protection Agency Office of Emergency and Remedial Response Washington
DC March 2 Draft
EPA 1999c Risk Assessment Guidance for Superfund Volume 3- (Pan A Process for
Conducting Probabilistic Risk Assessment) Draft Revision 5 United States Environmental
Protection Agency Solid Waste and Emergency Response December
EPA 1999d Responsiveness Summary for Allendale School Removal Action A Mile
Removal Action and Consolidation Prepared by EPA Region I October
GE 1997 Proposal for Alternative Input Values for the Human Health Risk Assessment
Memorandum from General Electric to the Massachusetts Department of Environmental
Protection and the US Environmental Protection Agency September 11
Goodrum PE JM Hassett DL Johnson and ME Dakins 1994 Applications of
microexposure Monte Carlo modeling to human health risk assessments A case study of
modeling childhood lead exposure Society for Risk Analysis Annual conference and
Exposition December 4-7 Baltimore MD
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 21 shy
Harrington NW Curry CL and PS Price 1995 The MicroExposure Event Modeling
Approach to Probabilistic Exposure Assessment Paper No 95-TA4203 Proceedings of the
88th Annual Meeting of the Air and Waste Management Association San Antonio Texas
USA June
Johnson T and J Capel 1992 A Monte Carlo Approach to Simulating Residential
Occupancy Periods and Its Application to the General US Population US Environmental
Protection Agency Office of Air Quality Planning and Standards Research Triangle Park
NC EPA-4503-92-011 August
Keenan RE MH Henning PE Goodrum MN Gray RA Sherer and PS Price
1993a Using a microexposure Monte Carlo risk assessment for dioxin in Maine (USA) fish to
evaluate the need for fish advisories Dioxin 93 13th International Symposium on
Chlorinated Dioxins and Related Compounds Vienna Austria
Keenan RE PS Price MH Henning PE Goodrum MN Gray RA Sherer and
WL Porter 1993b A Monte Carlo risk assessment for dioxin in Maine fish Using a
microexposure approach to evaluate the need for fish advisories TAPPI Proceedings 1993
Environmental Conference Boston MA
Keenan RE PS Price CL Curry JI McCrodden and JG Haggard 1995 Using a
microexposure Monte Carlo analysis to model potential exposures to PCBs through ingestion
of fish from the upper Hudson River In Society for Risk Analysis and the Japan Section of
SRA Annual Meeting and Exposition Waikiki HI (Abstract)
Keenan RE PS Price J McCrodden and ES Ebert 1996a Using a microexposure
event analysis to model potential exposures to PCBs through ingestion of fish from the Upper
Hudson River In Organohalogen Compounds Proceedings Dioxin 96-16th International
Symposium on Chlorinated Dioxins and Related Compounds Amsterdam The Netherlands
Organohalogen 3061-65
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 22 shy
Keenan RE NW Harrington PS Price and RO Richter 1996b Applying a
microexposure event analysis for a superfund site risk assessment Proceedings Superfund XVII
Conference Proceedings Washington DC October 15-17
Keenan RE JD Avantaggio and PS Price 1997a Should Maines rivers have fish
advisories for dioxin Using an integrated Microexposure Event and Toxicokinetic Model to
evaluate this question In SETAC North Atlantic Chapter Annual Meetings Proceedings
Abstract 1
Keenan RE JD Avantaggio and PS Price 1997b Using a combined Microexposure
Event and Toxicokinetic Model to evaluate the need for fish advisories based on a body burden
dosimetric In Society for Risk Analysis Proceedings Annual Meeting and Exposition
Abstract
MDEP 1995 Guidance for Disposal Site Risk Characterization - In Support of the
Massachusetts Contingency Plan -Interim Final Policy Massachusetts Department of
Environmental Protection Bureau of Waste Site Cleanup and Office of Research and
Standards Boston MA WSCORS-95-141 July
Pyrite Canyon Group 1994 Workplanfor the Health Risk Assessment of the Stringfellow
CERCLA Site in Riverside County California January
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when data are not normal or lognormal Risk Analysis Vol 19 November
Sielken RL 1994 More realistic exposure durations for more realistic people Society for
Risk Analysis Annual Conference and Exposition Baltimore MD December 4-7
Simon TW 1999 Two-dimensional Monte Carlo simulation and beyond A comparison of
several probabilistic risk assessment methods applied to a Superfund site Hum Ecol Risk
Assess 5 (4) 823-843
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
- 23 shy
Weston 2000a Supplemental Investigation Work Plan for the Lower Housatonic River
Prepared for US Army Corps of Engineers by Roy F Weston Inc February 22
Weston 2000b Engineering EvaluationCost Analysis for the Upper Reach of the Housatonic
River Prepared for US Army Corps of Engineers by Roy F Weston Inc February 11
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
ATTACHMENT A
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
A COMPARISON OF ALTERNATIVE METHODS
FOR CALCULATING EXPOSURE POINT CONCENTRATIONS
Prepared on Behalf of the General Electric Company
by Ogden Environmental and Energy Services Co Inc
Introduction
US Environmental Protection Agencys final Human Health Risk Assessment Work Plan
(Work Plan) which is Chapter 6 of EPAs final Supplemental Investigation Work Plan for the
Lower Housatonic River (Weston 2000) states that the exposure point concentration (EPC) for
each exposure area will be based on the 95 percent upper confidence limit (95 UCL) of the site
data or the maximum detected concentration whichever is lower (Section 6344) The Work
Plan states that the 95 UCL will be calculated using the Student- statistic for normally
distributed data and Lands H-statistic (H-UCL) for lognormally distributed data (p 6-32) No
plan is presented for estimating the mean from datasets where no parametric distribution can be
defined The way in which the EPCs are calculated can have a substantial impact on the values
derived and their representation of the actual levels of exposures that are likely to occur under
various scenarios
There are serious limitations associated with using EPAs proposed approach for calculating
EPCs While the Student- statistic may be appropriate when the sampling data are normally
distributed environmental samples seldom resemble a strict normal distribution Consequently
if EPA were to assume instead that the data for each exposure area are lognormally distributed
then use of the H-statistic to calculate the EPCs for these areas may substantially impact the
results of the risk assessment Use of the H-statistic to calculate the 95 UCL frequently results
in the calculation of extraordinarily high 95 UCL values This tendency is reflected in EPAs
policy stating that when the 95 UCL is greater than the maximum observed concentration the
maximum value should instead by used as the EPC However both the inflated 95 UCL and
c DOCSJRBGE 2oooEpcpaper2 an 1 Ogden Environmental and Energy Services
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
the maximum value are poor estimates of actual levels of exposure that will occur within a given
exposure area
GE believes that EPAs proposed approach should not be used to calculate EPCs for the risk
assessment of the Lower Housatonic River Rather GE urges EPA to use alternative methods
approved by EPA scientists and other experts for calculating EPCs that more accurately
represent the actual exposure levels that may be experienced by individuals who come into
contact with contaminated soil or sediment This paper provides an analysis of the limitations
associated with using the H-statistic to calculate EPCs for each exposure area In addition it
presents an evaluation of alternative approaches for calculating the EPCs that will result in more
representative results In GEs view the most appropriate approach is the use of the area-
weighting or spatial averaging technique that has been approved and specified by EPA for the
banks of the Upper 12 Mile Reach of the Housatonic River and for the Areas Outside the River
under the October 1999 Consent Decree However if and where that approach is not used this
paper describes other alternative techniques for deriving a 95 UCL that is much more
representative of actual exposure point concentrations than use of the H-statisticmdashnamely the
bootstrapping technique and a modification of that approach known as spatial bootstrapping
Of these the latter is preferable because it takes into account the spatial distribution of the
sampling data This paper illustrates the EPCs that would be calculated for an example property
using the various methods discussed herein
Problems Associated with the Use of the H-Statistic to Calculate the 95 UCL
In 1992 EPA issued a guidance document (EPA 1992) indicating that the appropriate
concentration to use in estimating direct-contact exposures to compounds in soil is the arithmetic
mean of the observed concentrations within an exposure area This position is based on the
concept that chronic direct-contact exposure to soils most likely occurs as a result of short
repeated random contact events throughout the identified exposure area Thus the mean
concentration is reflective of the long-term exposure concentration Because it is this nature of
the exposure activity that determines the EPC the arithmetic mean is the appropriate estimator of
the EPC regardless of the statistical distribution of observed sample concentrations
c DocsiRBGE2oooEPCpapCT2 laquon 2 Ogden Environmental and Energy Services
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
The guidance document goes on to note that because there is uncertainty in the overall soil
concentration at a given site (due to limited coverage by soil sampling) the 95 UCL on the
arithmetic mean should be used to account for this uncertainty This approach was adopted by
EPA Region 1 (EPA 1994) In fact EPA Region I has taken the position that the 95 UCL is
appropriate for both the central tendency and reasonable maximum exposure computations
For log-normally distributed data EPA uses the H-statistic method (H-UCL) to calculate the
EPCs In this method the 95 UCL is calculated using the following equation (EPA 1992)
( - t -05s 2 +tlaquoTraquo^) UCL=e
Where
UCL = a specified limit (ie 95) on the estimate of the arithmetic mean
x (bar) = the mean of log-transformed data
s = the variance of the log-transformed sample distribution
s = the standard deviation of the log-transformed sample distribution
H a statistic accounting for interaction of the distribution developed by Land (1975)
n = number of analytical samples
Frequently the H-statistic calculates extraordinarily high 95 UCL values In some cases this
inflated estimate of the mean reflects the uncertainty associated with having a limited number of
samples from a skewed data population This uncertainty is compounded when as is the case
with much of the Housatonic River investigation sampling efforts are concentrated in areas that
are suspected of having elevated levels of PCBs Because the sampling of areas that are not
likely to be contaminated is much more limited despite the fact that those areas are equally
important components of the total exposure area the datasets for most parcels are likely to be
highly biased
c DOCS jRBGE2oooEPCplaquoplaquor an 3 Ogden Environmental and Energy Services
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
Use of the H-statistic also does not calculate a reliable and representative 95 UCL when data
are not clearly lognormally distributed Because the distributions of contaminant concentrations
at sites are often positively skewed they are frequently assumed to be lognormally distributed
for statistical evaluations In fact EPA guidance (EPA 1992) promotes this approach by
suggesting that it is reasonable to assume that data are lognormally distributed even if the data
do not fit a normal distribution after log-transformation However this assumption is often
incorrect As noted in a recent EPA technical support document prepared by EPA contractors
Singh et al (EPA 1997a) [t]his apparent skewness may be due to biased sampling multiple
populations or outliers and not necessarily due to lognormally distributed data Small
deviations between the assumed lognormal distribution of the data and the actual population
distribution can greatly influence the statistical results and yield a gross misrepresentation of the
true mean and associated confidence bound Thus when sampling data that do not fit a
lognormal distribution are inappropriately assumed to be lognormally distributed and are
evaluated using the H-statistic the result can be estimates of the mean and 95 UCL that are not
at all representative of site conditions
Even when the data are lognormally distributed the H-statistic can produce concentrations that
are far higher than the true mean In their recent study Singh et al (EPA 1997a) evaluated and
compared UCLs calculated through different statistical estimation methods using a variety of
sampling data sets and reported that the UCLs obtained using the H-statistic were consistently
larger and less accurate than the UCLs calculated using other statistical approaches They noted
that it is observed that the H-UCL becomes order of magnitudes higher even when the data
were obtained from a lognormal population and can lead to incorrect conclusions This is
especially true for samples of smaller sizes (eg lt 30) They stated further that [t]he practical
merit of the H-UCL in environmental applications is questionable as it becomes orders of
magnitude higher than the largest concentration observed when the [standard deviation] of the
log transformed data starts exceeding 10 These contractors concluded that use of the H-UCL
approach can yield unusually high false positives which would result in an unnecessary
cleanup The same conclusions were reached by EPA consultants Schultz and Griffin (1999) in
an analysis of hazardous waste sites in EPA Region 8 They noted that the H-statistic may
c DocsjRBgtGEoooltpoundpcpiperin 4 Ogden Environmental and Energy Services
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
overestimate the exposure point concentration and may lead to unnecessary cleanup of hazardous
waste sites
Finally the use of the H-statistic is not appropriate when there are indications that the arithmetic
mean is not correctly estimated within the calculation This difficulty occurs when an
exponential estimator of the mean is not equal to the typical calculation of the sum of all samples
divided by the number of samples
Even in 1992 EPA recognized the potential for the H-statistic to overestimate the 95 UCL To
control this EPA recommended that when the statistic is greater than the maximum observed
concentration at the site the maximum value should instead be used as the EPC This approach
leads to the unsupportable assumption that the average concentration within an exposure area is
best represented by the highest concentration encountered there and does not account in any way
for the spatial distribution of contamination As EPA itself has recognized in responding to
comments on the Upper l2 Mile Reach Removal Action receptors both human and ecological
are exposed to average contaminant concentrations not maximum concentrations where
exposure is equally likely throughout a given area (EPA 1999)
In view of these considerations EPA contractors and consultants have recommended that the H-
statistic should not be used particularly in cases where a lognormal distribution of the data is not
certain For example based on comparison of the UCLs calculated using the H-statistic versus
those calculated using other approaches the EPA technical support document by Singh et al
(EPA 1997a) recommended that in environmental applications the use of the H-UCL to obtain
an estimate of the upper confidence limit of the mean should be avoided These authors
concluded that the use of the jackknife method the bootstrap method or the Chebychev
inequality method provides better input to the risk assessors and may result in a significant
reduction in remediation costs This is especially true when the number of samples is thirty or
less[U]pper confidence limits based on any of the other estimation procedures appear to be
more stable and reliable than those based on the H-statistic Similar findings were reported by
Schulz and Griffin (1999)
c 0ocsjRBGEooopoundpcpipcr2an 5 Ogden Environmental and Energy Services
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
Area-Weighted Averaging (Spatial Averaging)
When samples are collected randomly within an exposure area and exposures are likely to occur
randomly and with equal likelihood in all parts of that exposure area the simple arithmetic mean
is most representative of the long-term EPC However if sampling is intentionally biased
towards more contaminated areas the sampling database no longer accurately represents the
areas of potential exposure This is the case with the sampling outlined in EPAs Work Plan for
the Lower Housatonic River Contaminant delineation is an important goal of the sampling
effort on the lower Housatonic River Consequently sampling is intentionally focused on areas
(eg floodplain soils) that are known or suspected of being contaminated Because many of the
samples in these datasets will be taken from high concentration areas with limited samples taken
from uncontaminated areas calculating a mean using typical statistical procedures gives undue
weight to the locations where high concentrations exist A dataset obtained in this way cannot be
used to reliably calculate an arithmetic mean or its 95 UCL because it is unlikely to be
representative of the overall exposure area
This problem can be substantially mitigated by area-weighting the sample results so that the
importance of each sample in the calculation of the EPC is directly proportional to the fraction of
the exposure area that it represents Area-weighted averaging may be conducted in a number of
ways with differing levels of complexity In some cases polygonal areas are constructed by
drawing a series of lines that are equidistant between adjacent sampling locations (Thiessen
polygons Clifford et al 1995) in more complex approaches unbiased estimates of
concentration and variance change with distance and the results are used to construct a spatial
grid of estimated concentrations (ordinary kriging) (Isaaks and Srivastava 1989)
Because kriging is data intensive and unlikely to be feasible for many of the exposure areas in
the Rest-of-the River investigation area-weighting using Thiessen polygons is a more
appropriate approach The process is to draw perpendicular lines that are exactly equidistant
between contiguous samples until each sample is surrounded by a polygon Polygons at the
periphery of an identified exposure area are truncated at its boundary (eg a property boundary)
One reason that this technique is so facile is that the construction of Thiessen polygons is a
c DOCS JRB GE2oooEPCpapcr an 6 Ogden Environmental and Energy Services
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
hardwired operation in most commercially-available Geographic Information System (GIS)
applications
The area inside each polygon is then used to weight the samples in statistical calculations which
is in essence saying that the probability of encountering the concentration represented by a
certain polygon is equal to the proportion of the total exposure area represented by the area in
that polygon The area-weighted concentration is calculated using the following formula (Isaaks
and Srivastava 1989)
Where
X(bar)sc = area weighted mean concentration (mgkg)
C| = the concentration representing the condition within polygon
where there are = 1 through n polygons and
pi = the proportion of the total area that is incorporated in polygon (unitless)
This approach includes sampling data from all portions of the exposure area and corrects for the
biases in the sampling data by taking into consideration the spacing of the samples within the
exposure area This is the approach that was used by GE and approved by EPA for calculating
EPCs for the banks of the Upper 2 Mile Reach of the Housatonic River and it is the approach
specified in the October 1999 Consent Decree for the Removal Actions Outside the River (which
include a number of floodplain properties along the River) Hence use of this approach for the
Lower Housatonic River would lead to a consistent approach for the different portions of the
overall Site subject to the Consent Decree Moreover spatial averaging has also been used at
other Superfund sites For example such an approach was used at the Koppers Company
Superfund site in Charleston SC (EPA 1995) where biased sampling was also conducted To
offset that bias a nearest neighbor approach was used This involved placing the sampling
data in clusters by gridding the site and calculating an average concentration for each grid using
the sampling data available within it Then the average concentrations for each grid were
c xxxTSjRB GE2oooEPCpraquoper2 laquon 7 Ogden Environmental and Energy Services
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
averaged to yield an average concentration for the entire exposure area In addition at the
Fields Brook site in Ashtabula Ohio EPA Region V defined the residential cleanup goal as the
average goal across the exposure unit (EPA 1997b)
Use of spatial averaging is also supported by the comments of EPAs Science Advisory Board
(SAB) (EPA 1993) in its review of the Draft Risk Assessment Guidance for Superfund Human
Health Evaluation Manual In that review the SAB stated that the
[ejstimate of the RME cannot ignore the distribution of contamination at the site and the
distribution of individual behaviors which lead to exposure Rather the spatial distribution
of the concentration over the site must be considered along with a distribution reflecting the
relative frequency with which people are likely to visit different parts of the site For this
reason any summary measure of concentration (such as the average proposed in the RAGS
document) that does not take into account the spatial distribution of the underlying samples is
likely to be inadequate Therefore the Committee believes that the Agency should give
strong consideration to incorporating methods such as kriging or triangulation that take
into account the spatial distribution of contamination to characterize exposure (p 15)
In the Consent Decree for this Site EPA determined that use of the spatial averaging approach is
fully protective of human health and the environment Indeed in responding to comments on the
Upper 2 Mile Reach Removal Action EPA specifically justified this spatial averaging approach
so long as sufficient data are available (EPA 1999) As EPA recognized use of this approach
requires the availability of adequate data to support the calculation of representative spatial
averages over the exposure area However the need for adequate data exists regardless of the
averaging method used to derive the EPCs and is particularly marked when one attempts to use
the H-statistic approach Under the phased approach that EPA has outlined in the Human
Health Risk Assessment Work Plan such sampling would only be necessary in areas where the
95 UCL exceeds the applicable screening risk-based concentration (SRBC) In such areas the
conduct of additional sampling is warranted in order to avoid the use of the extreme and
unrepresentative 95 UCL values as the EPCs Thus GE urges EPA to collect adequate
additional samples as necessary for individual exposure areas to ensure that reasonable and
cvDocsjRBOE2oooEpcpj[rn 8 Ogden Environmental and Energy Services
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
representative estimates of the average concentration can be calculated for the risk assessment of
each exposure area
Bootstrapping Approach to Calculating the Upper Bound EPC
As stated previously GE supports the use of area-weighted arithmetic averages as the most
representative means of calculating EPCs However if EPA rejects this approach or for areas
where the sampling data are not considered sufficient to support use of such spatial averaging
there are alternative methods of calculating the 95 UCL which would result in more reliable
and representative EPCs than use of the H-statistic As discussed previously and demonstrated
by Singh et al (EPA 1997a) and Schulz and Griffin (1999) the H-statistic is likely to
substantially overestimate the 95 UCL particularly where datasets are small These EPA
contractors and consultants have evaluated and suggested alternative and superior approaches for
calculating the 95 UCL Of these the most straightforward is bootstrapping which allows
calculation of a probability distribution of EPCs and is able to offset some of the statistical
limitations associated with small datasets It provides a method for calculating a more reliable
estimator of the EPC particularly when datasets are small and data do not exactly fit either a
normal or lognormal distribution This technique is described below and is recommended as an
alternative and preferable method to the use of the H-statistic for calculating the 95 UCL for
exposure areas in the Lower Housatonic River risk assessment
The Bootstrap Method
Similar to Monte Carlo sampling bootstrapping is a process where a subpopulation of the overall
dataset of analytical values is selected at random and a mean is calculated This mean is
archived and the statistical sampling procedure is repeated with replacement of the data The
bootstrapped means are collected until their distribution stabilizes (this often requires many
iterations of sampling) The 95 percentile of this distribution of means is equivalent to the 95
UCL of the mean Bootstrapping may be viewed as an imitation of the Central Limit Theorem of
statistics which states that if a population is repeatedly sampled for purposes of calculating an
arithmetic mean the distribution of the means will ultimately take the form of a normal
c -nocsjRB GE2oooEPcPraquoplaquo an 9 Ogden Environmental and Energy Services
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
distribution This Theorem applies to any population regardless of its statistical distribution
Bootstrapping relies on the same theory except that instead of repeatedly taking new samples
bootstrapping models select repeatedly from a single dataset
This procedure frequently results in a lower estimate of the 95 UCL than the 95 H-UCL and
is more technically justifiable The reduction obtained from this procedure varies according to
the amount of data available for sampling and its variance but is consistently lower than the 95
H-UCL (EPA 1997a Schulz and Griffin 1999) The difference between these two measures is
more dramatic for exposure areas with fewer samples Thus the bootstrapping method is
particularly well-suited to exposure areas like many along the Lower Housatonic River where
sampling is limited
When sample sizes are small the distribution for the bootstrap run may not appear normal This
is because the number of samples required to be included in each subsample in order to approach
normality is larger than is available from the site data This can be mitigated by further
sampling It should be noted however that as the sample size increases the variance becomes
tighter As such more samples will decrease the estimate of the 95 UCL Thus the 95
UCL obtained from a limited sample set may be viewed as a conservative estimate of the 95
UCL that might be obtained from a larger dataset
Spatial Bootstrapping
While bootstrapping is a more appropriate approach for estimating the 95 UCL than is the H-
statistic method it still does not take into consideration the spatial distribution of the sampling
locations across an exposure area and the biases associated with it To reflect this spatial
distribution a modified bootstrapping approach can be used to reflect the locations and densities
of samples collected on each property This modified bootstrapping approach (spatial
bootstrapping) which incorporates area-weighting into the bootstrapping method can be used to
calculate more reliable 95 UCL estimates of the EPC for each exposure area
c DOCS JRBGE^oooEPCpaplaquor2 an 10 Ogden Environmental and Energy Services
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
Spatial bootstrapping is conducted by combining area-weighted averaging of each sample with
the bootstrap method discussed above It is conducted in a series of three steps
First each sample is area-weighted using the Thiessen polygon method The weight given to
each sample is determined by calculating the proportion of the total exposure area that is
represented by the samples polygon
Second the smallest polygon area is identified and all other polygon areas are normalized with
respect to its size and are weighted accordingly in the sampling database In other words if the
area of one polygon is five times larger than the smallest polygon it is given a weight that is five
times greater in the sampling database from which values are selected for the bootstrap The
normalized areas are rounded off to the closest integer An expanded dataset is then created
from the original dataset and reflects the ratios of each polygon area to the smallest polygon area
This expansion is accomplished by multiplying the rounded normalized areas by a selected
value The resulting values represent an area-weighted number of replicates of the sample
results in the expanded dataset For example if the ratio of the sample X area to the smallest area
is 7 and the dataset is expanded using a factor of five then sample X is represented in the
expanded dataset as 35 (7 times 5) entries
Finally mean concentrations are calculated (with replacement) from the expanded dataset using
between 250 and 500 of the results (depending upon the size of the expanded dataset) for each
iteration of the bootstrap This calculation is then repeated a total of 5000 times to yield a
cumulative distribution of estimated means After sufficient iterations this cumulative
distribution will attain the symmetry of a normal distribution with the 50th percentile result equal
to the area-weighted mean calculated for the original dataset The value at the 95th percentile of
this distribution of means is the 95 UCL
c jraquocsjRBGE2oooEpcpiper an 11 Ogden Environmental and Energy Services
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
This approach is both consistent with EPA guidance and preferable to the approach currently
proposed by EPA Although EPA guidance recommends that the 95 UCL of the mean be used
as the EPC in order to correct for the uncertainties and limitations of the sampling database the
guidance does not dictate which method must be used to calculate the 95 UCL While EPA
has in the past often relied on the H-statistic that method need not be used and recent reports
by EPA contractors and consultants have recommended against use of the H-statistic to avoid
systematic overestimates of the 95 UCL (EPA 1997a Schulz and Griffin 1999)
The spatial bootstrapping approach is better than the H-statistic approach for several reasons
First as has been shown by Singh et al (EPA 1997a) and Schulz and Griffin (1999)
bootstrapping provides more reliable estimates of the 95 UCL concentration Second spatial
bootstrapping allows for consideration of the location and density of the samples for each
exposure area and corrects for the relative contribution of each in the database Third this
approach mitigates the statistical impact of a small database Finally it can be used regardless of
the distribution of the sampling data Thus for all sampling datasets including those that do not
fit either a normal or lognormal distribution this approach can be used successfully
Sample Calculation
To demonstrate the effect that the various statistical methods discussed above have on the
calculation of the arithmetic mean and 95 UCL of a single dataset Ogden has conducted
sample calculations for a hypothetical property For this purpose Ogden developed a
hypothetical set of sampling data that are similar to the types of data that might be collected from
a single floodplain property As shown in Table 1 it is assumed that 30 surface soil samples
have been collected from a property that is roughly 1000 by 1000 feet in size The
concentrations for these hypothetical samples range from 0022 to 426 ppm with an arithmetic
mean of 133 ppm The assumed polygon areas range in size from 2123 square feet to 138219
square feet
c DOCS JRB GEY2oooEPCplaquofr2 in 12 Ogden Environmental and Energy Services
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
These data have been used to calculate mean and 95 UCL EPCs using four different
approaches 1) EPA approach (arithmetic mean and H-statistic) 2) an area-weighted average 3)
a bootstrap and 4) a spatial bootstrap Results of this analysis are provided in Table 2
As can be seen in Table 2 the estimated mean values are similar for all four approaches The
arithmetic mean (133 ppm) is identical to the mean obtained during straight bootstrapping This
is consistent with the Central Limit Theorem (EPA 1997a) The area-weighted average is
identical to the mean calculated in spatial bootstrapping but is somewhat higher than the
arithmetic and bootstrapped means This is because some of the sampling locations with high
concentrations (eg samples 29 and 30 Table 1) have very large polygon areas Thus they are
weighted more heavily in approaches that include area-weighting
Estimates of the 95 UCL vary substantially Using bootstrapping and spatial bootstrapping
the 95 UCL values are similar (1706 and 1638 ppm respectively) However the 95 UCL
calculated using the H-statistic (106 ppm) is considerably higher exceeding both the
bootstrapped and spatial bootstrapped values by a factor of more than six In fact the H-UCL
exceeds the maximum value (426 ppm) in the sampling database which means that the
maximum value would be substituted as the EPC Thus in this hypothetical case the EPC that
would need to be used for this exposure area would be the maximum detected concentration of
426 ppm This concentration is clearly not representative of the sampling dataset for this
hypothetical property
Discussion
The example presented above clearly demonstrates the limitations of the H-statistic approach and
its result is consistent with the findings of Singh et al (EPA 1997a) and Schulz and Griffin
(1999) These limitations occur despite the fact that the sampling dataset for this hypothetical
property is reasonably large containing 30 samples and the range of concentrations is fairly
typical of the types of concentrations likely to be found in floodplain properties along the Lower
Housatonic There are no outliers to skew the results and the concentrations fall along a fairly
typical concentration gradient with a few samples showing very low concentrations a few
c noes jRBGE2oooEPCpapei an 13 Ogden Environmental and Energy Services
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
showing fairly high concentrations and the remainder fairly evenly distributed in between In
addition some of the largest polygon areas are associated with low concentration areas while
others are associated with the highest concentrations There is nothing about this dataset to
indicate that the samples reported are not representative of the range of concentrations that is
likely to be present on the property
Despite this the H-statistic approach results in a 95 UCL that exceeds the highest measured
concentration by a factor of more than six with the result being that the 95 UCL cannot be
used as a reliable estimate of the EPC in the risk assessment for this property Clearly this result
is disturbing in that it occurs for a property that has a reasonable number of samples and a fairly
even distribution of sampling results These problems will be even more pronounced for certain
real properties along the Lower Housatonic River where the sampling results may be highly
skewed dense sampling occurs in high concentrated areas andor sample numbers are more
limited It is likely given the tendency of the H-statistic to overestimate the 95 UCL that
maximum concentration values will need to be used for many of those properties Such an end
result is unsupportable and will result in the overestimation of site risks It could also result in
the unnecessary remediation of a number of exposure areas
Conclusions
As discussed above and demonstrated by EPA contractors and consultants (EPA 1997a Schulz
and Griffin 1999) use of the H-statistic approach to calculating the 95 UCL in environmental
applications can and frequently does produce substantial overestimates of the true mean that are
in no way representative of the concentrations actually contacted by people in an exposure area
and hence that approach should be avoided Moreover this problem cannot be solved by simply
using the maximum detected concentration in the area if the calculated 95 UCL exceeds that
maximum These problems can be avoided through the use of area-weighted averaging which
takes into consideration the fraction of the exposure area that is represented by each sample
taken This approach allows for a more realistic and representative estimate of the level of
exposure likely to occur to individuals in the area over time Thus GE urges EPA to adopt the
area-weighted averaging approach for the Lower Housatonic River and to collect the data
c DOCS JRB GE2ooo poundpcpraquoplaquor2 tt 14 Ogden Environmental and Energy Services
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
necessary to support that approach Use of spatial average concentrations as EPCs is consistent
with EPA policies procedures and guidelines (EPA 1989 1993) and would be consistent with
the approach approved by EPA and specified in the Consent Decree for the riverbank soils in the
Upper 2 Mile Reach of the River and the Removal Actions Outside the River including certain
floodplain properties along the River
However if EPA rejects this approach (or for areas where EPA determines that the data are
insufficient to support that approach and decides not to collect additional data) and insists on
estimating a 95 UCL of the mean for use as the EPC then GE recommends that spatial
bootstrapping be used to calculate the 95 UCL This approach is more reliable and yields more
representative EPCs than the H-statistic method (EPA 1997a Schulz and Griffin 1999) offsets
many of the limitations associated with the H-statistic and is consistent with EPA guidance
References
Clifford PA DE Barchers DF Ludwig RL Sielken JS Klinginsmith RV Graham and
MI Banton 1995 Hazardrisk assessments An approach to quantifying spatial components of
exposure for ecological risk assessment Environmental Toxicology and Chemistry 14(5)895shy
906
EPA 1989 Risk Assessment Guidance for Superfund Volume I Human Health Evaluation
Manual (Part A) - Interim Final US Environmental Protection Agency Office of Emergency
and Remedial Response Washington DC EPA5401-89-002 July
EPA 1992 Supplemental Guidance to RAGS Calculating the Concentration Term OSWER
Bulletin Volume 1 Number 1 Publication 92857-08 May
EPA 1993 An SAB Report Superfund Site Health Risk Assessment Guidelines Review of the
Office of Solid Waste and Emergency Responses Draft Risk Assessment Guidance for Superfund
Human Health Evaluation Manual by the Environmental Health Committee EPA-SAB-EHCshy
93-007 February
c DOCS JRB GE2oooEPCpapcr2 m 15 Ogden Environmental and Energy Services
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
EPA 1994 A change in the approach for estimating the average and reasonable maximum
exposure scenario for human health risk assessments EPA Region I Risk Updates Number 2
August
EPA 1995 Final Baseline Risk Assessment for Koppers Company Inc (Charleston Plant)
Remedial InvestigationFeasibility Study Oversight Black and Veatch Waste Science Inc EPA
Region IV Contract 68-W9-0055
EPA 1997a The Lognormal Distribution in Environmental Applications Prepared for EPA by
A Singh A Singh and M Engelhard US Environmental Protection Agency Office of
Research and Development EPA600R-97006 December
EPA 1997b Record of Decision Fields Brook Superfund Site US Environmental Protection
Agency Region V EPA54lR-97070 August 15
EPA 1999 Responsiveness Summary for Allendale School Removal Action Mile Removal
Action and Consolidation US Environmental Protection Agency Region I October
Isaaks EH and RM Srivastava 1989 Applied Geostatistics Oxford University Press New
York
Land CE 1975 Tables of confidence limits for linear functions of the normal mean and
variance Selected Tables in Mathematical Statistics 3365-410
Schulz TW and S Griffin 1999 Estimating risk assessment exposure point concentrations
when the data are not normal or lognormal Risk Analysis 19577-584
Weston 2000 Supplemental Investigation Work Plan for the Lower Housatonic River Final
Roy F Weston Inc Manchester NH DCN GEP2-02-000-AAME February 2000
c DOCS jRBGE2ooooEPcPaper2 m 16 Ogden Environmental and Energy Services
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
5
10
15
20
25
30
Table 1 Surface Soil Sampling Data and Polygon Areas for a Hypothetical Floodplain Property
Sample Number
1
2 j
4
6
7
8
9
1 1
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
Polygon Area (ft2)
138219
105217
110214
31021
11399
7004
14268
12121
3697
13297
33087
65541
57494
9014
9994
2123
2341
7446
2613
44231
2362
8745
5008
18762
24296
51356
12996
18019
119814
132012
Total PCB (ppm)
0022
0028
107
163
167
201
2 19
325
377
758
761
762
771
799
872
969
972
127
143
143
154
163
187
191
202
308
329
389
406
426
Projects GE HousgtResl of RuerEPC MemoVEPC Fmraquol 05l600Tiblcslamp2 ill
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q
R U
O C ^gt X R
M _C EL o 2 VI
o O 3)
is R c
J U 1
_
II ^
c R 4)
00 ri ^O
mdash
bdquo ^O IT)
c o o ra
R
B
C s s pound -5
W)e EL c RU en o O a
U U 3 V
bullI
C Ril
O t
p^
Hi
T3
X03 agt E pound u n pound1
pound0
o r-T
II u
O M R
4raquo gt
bullsi
0 0
IT ON
nshy
^
bull3
o ^ S
amp CJ) c
c z o laquo T3
UM laquo
O k
I I C CJC R C
is ltshy R o u
8 lt R 3
B SE c o x
R
3
^ R
2
bull| R
i X
C R
i
U
1
~o R
^laquo
^O
12
(N
O
c-i
H
T3 0gt
ra
Crt
S I 2
III iS mdash shybdquo 2 15 T3 C = u c o ltu ^ ogt C raquoshy r ^
pound 2 lt2 gt o u p o5 2 = pound 3 LO _ shy u ra
bullpound U X)
9 2 -5 = sect I J3 if shy
Q Q ra
V) W
ilaquo S ca ^ laquo u c mdash
a s e ta Q