Upload
roger-peng
View
401
Download
2
Embed Size (px)
Citation preview
Bayesian hierarchical models for estimating the health
effects of air pollution sourcesRoger D. Peng, PhD
Department of Biostatistics Johns Hopkins Bloomberg School of Public Health
@rdpeng, simplystatistics.org
(joint work with Jenna Krall and Amber Hackstadt)
New England Statistics Symposium April 2016
Not So Standard Deviations(with Hilary Parker of Stitch Fix)
Subscribe in iTunes: https://goo.gl/ZhWYbdhttps://soundcloud.com/nssd-podcast
Particulate Matter and Health• PM has been linked with health outcomes:
hospitalization, mortality, decreased lung function, cardiac events
• WHO estimates ~800,000 premature deaths per year
• Evidence of both short-term (acute) and long-term (chronic) effects of exposure to ambient PM
• Much recent work has examined ambient PM mass (PM10, PM2.5, PM10-2.5) indicator
• There is strong evidence that ambient PM is associated with mortality and morbidity
• What should we do about it? How can we intervene to improve health?
• Target sources of PM that are most harmful to human health
• How do we identify sources of ambient PM?
What’s Next?
• There is strong evidence that ambient PM is associated with mortality and morbidity
• What should we do about it? How can we intervene to improve health?
• Target sources of PM that are most harmful to human health
• How do we identify sources of ambient PM?
What’s Next?
Pollution Source Apportionment
9%10%
12%
13% 36%
19%
Particulate matter
Power plant
Car
Oil heating
7%8%10%11%
29%
35%
12%15%15%
20%24%
15%
5%17%
20%23%
5%
30%
Source-specific concentrations
Elemental carbon
Organic carbon
Sulfate
Nitrate
Nickel
Vanadium
Pollution Source Apportionment
9%10%
12%
13% 36%
19%
Particulate matter
Elemental carbon
Organic carbon
Sulfate
Nitrate
Nickel
Vanadium
Pollution Source Apportionment
9%10%
12%
13% 36%
19%
Particulate matter
Elemental carbon
Organic carbon
Sulfate
Nitrate
Nickel
Vanadium
?
Pollution Source Apportionment Methods
Y = FΛ
o εSource profiles
Observed chemical constituents (n x p) Error (n x p)Source
concentrations (n x k)
(k x p)
Problems• Current source apportionment methods are applied
on an ad hoc, highly tweaked, basis and are difficult to scale to a region or nation
• Source apportionment models are typically informed by investigator’s local knowledge
• No reproducible way to combine information across locations to gain power when estimating health effects (multi-site studies)
Incorporating New Data Sources on Pollution Sources
Component Data Source
Particulate Matter EPA Air Quality System (AQS)
PM Chemical Constituents EPA Chemical Speciation Network
PM Source Profiles EPA SPECIATE Database
PM Source Emissions EPA National Emissions Inventory
Model Specification
Chemical Speciation Network
SPECIATENEI
2002 2003 2004 2005
05101520
date
SULFATE
2002 2003 2004 2005
0
5
10
15
date
NITR
ATE
2002 2003 2004 2005
0.51.01.52.02.5
date
Elem
ental_Ca
rbon
2002 2003 2004 20050246810
OC_
K14
Sulfate
Nitrate
Elemental carbon
Organic carbon
alum
inum
ammonium_ion
antim
ony
arsenic
barium
brom
ine
cadm
ium
calcium
cerium
cesiu
mchlorine
chromium
cobalt
copper
elem
ental_carbon
europium
gallium
gold
hafnium
indium
iridium iron
lanthanum
lead
magnesiu
mmanganese
mercury
molybdenum
nickel
niobium
nitrate OC
phosphorus
potassium
rubidium
samarium
scandium
selenium
silico
nsilver
sodium
_ion
strontium
sulfate
sulfur
tantalum
terbium tin
titanium
tungsten
vanadium
yttrium zinc
zirconium
alum
inum
ammonium_ion
antim
ony
arsenic
barium
brom
ine
cadm
ium
calcium
cerium
cesiu
mchlorine
chromium
cobalt
copper
elem
ental_carbon
europium
gallium
gold
hafnium
indium
iridium iron
lanthanum
lead
magnesiu
mmanganese
mercury
molybdenum
nickel
niobium
nitrate OC
phosphorus
potassium
rubidium
samarium
scandium
selenium
silico
nsilver
sodium
_ion
strontium
sulfate
sulfur
tantalum
terbium tin
titanium
tungsten
vanadium
yttrium zinc
zirconium
Informative Prior Distribution
Railroad Equipment/Diesel (1999-2008)
Agricultural Crop/Livestock Dust (1999-2008)
Annual Source Emissions
Model Fitting• We use Markov chain Monte Carlo to simulate from
the posterior distribution of the unknown parameters
• Adaptive MCMC approach of Haario et al (2001)
• Use data from SPECIATE and NEI to calibrate the prior distributions
• Constraints placed on profile matrix based on what is known about composition of specific sources
Estimating Health Effects• For individual cities, estimated source time series can
be plugged into regression models with health outcomes
• For multi-site studies, source determination cannot be a manual process (not reproducible)
• Need automatic method to combine information across a region
• Current approaches assume pollution sources are the same everywhere
US EPA Chemical Speciation Network • 85 monitors, 24 constituentsMedicare cohort (1999—2010) • CVD hospitalizations for 63 counties
SHARE• A method for estimating health effects of sources
SHared Across a REgion
• Sources are estimated at individual locations and health effects estimated
• Sources are matched across locations via population value decomposition (Crainiceanu et al. 2011)
• Health effects combined for common sources via hierarchical modeling
SHARE for Source ID
PM2.5 and CVD Hospitalizations
Summary• Bayesian source apportionment model can
integrate information from 3 national databases
• Data on sources and profiles can be used to constrain the problem and to construct informative prior distributions
• SHARE method can be used to automatically combine health effects of estimated sources across a region