28
Bayesian hierarchical models for estimating the health effects of air pollution sources Roger D. Peng, PhD Department of Biostatistics Johns Hopkins Bloomberg School of Public Health @rdpeng, simplystatistics.org (joint work with Jenna Krall and Amber Hackstadt) New England Statistics Symposium April 2016

New England Statistics Symposium 2016

Embed Size (px)

Citation preview

Page 1: New England Statistics Symposium 2016

Bayesian hierarchical models for estimating the health

effects of air pollution sourcesRoger D. Peng, PhD

Department of Biostatistics Johns Hopkins Bloomberg School of Public Health

@rdpeng, simplystatistics.org

(joint work with Jenna Krall and Amber Hackstadt)

New England Statistics Symposium April 2016

Page 2: New England Statistics Symposium 2016

Not So Standard Deviations(with Hilary Parker of Stitch Fix)

Subscribe in iTunes: https://goo.gl/ZhWYbdhttps://soundcloud.com/nssd-podcast

Page 3: New England Statistics Symposium 2016

Particulate Matter and Health• PM has been linked with health outcomes:

hospitalization, mortality, decreased lung function, cardiac events

• WHO estimates ~800,000 premature deaths per year

• Evidence of both short-term (acute) and long-term (chronic) effects of exposure to ambient PM

• Much recent work has examined ambient PM mass (PM10, PM2.5, PM10-2.5) indicator

Page 4: New England Statistics Symposium 2016

• There is strong evidence that ambient PM is associated with mortality and morbidity

• What should we do about it? How can we intervene to improve health?

• Target sources of PM that are most harmful to human health

• How do we identify sources of ambient PM?

What’s Next?

Page 5: New England Statistics Symposium 2016

• There is strong evidence that ambient PM is associated with mortality and morbidity

• What should we do about it? How can we intervene to improve health?

• Target sources of PM that are most harmful to human health

• How do we identify sources of ambient PM?

What’s Next?

Page 6: New England Statistics Symposium 2016

Pollution Source Apportionment

9%10%

12%

13% 36%

19%

Particulate matter

Power plant

Car

Oil heating

7%8%10%11%

29%

35%

12%15%15%

20%24%

15%

5%17%

20%23%

5%

30%

Source-specific concentrations

Elemental carbon

Organic carbon

Sulfate

Nitrate

Nickel

Vanadium

Page 7: New England Statistics Symposium 2016

Pollution Source Apportionment

9%10%

12%

13% 36%

19%

Particulate matter

Elemental carbon

Organic carbon

Sulfate

Nitrate

Nickel

Vanadium

Page 8: New England Statistics Symposium 2016

Pollution Source Apportionment

9%10%

12%

13% 36%

19%

Particulate matter

Elemental carbon

Organic carbon

Sulfate

Nitrate

Nickel

Vanadium

?

Page 9: New England Statistics Symposium 2016

Pollution Source Apportionment Methods

Y = FΛ

o εSource profiles

Observed chemical constituents (n x p) Error (n x p)Source

concentrations (n x k)

(k x p)

Page 10: New England Statistics Symposium 2016

Problems• Current source apportionment methods are applied

on an ad hoc, highly tweaked, basis and are difficult to scale to a region or nation

• Source apportionment models are typically informed by investigator’s local knowledge

• No reproducible way to combine information across locations to gain power when estimating health effects (multi-site studies)

Page 11: New England Statistics Symposium 2016

Incorporating New Data Sources on Pollution Sources

Component Data Source

Particulate Matter EPA Air Quality System (AQS)

PM Chemical Constituents EPA Chemical Speciation Network

PM Source Profiles EPA SPECIATE Database

PM Source Emissions EPA National Emissions Inventory

Page 12: New England Statistics Symposium 2016

Model Specification

Chemical Speciation Network

SPECIATENEI

Page 13: New England Statistics Symposium 2016

2002 2003 2004 2005

05101520

date

SULFATE

2002 2003 2004 2005

0

5

10

15

date

NITR

ATE

2002 2003 2004 2005

0.51.01.52.02.5

date

Elem

ental_Ca

rbon

2002 2003 2004 20050246810

OC_

K14

Sulfate

Nitrate

Elemental carbon

Organic carbon

Page 14: New England Statistics Symposium 2016

alum

inum

ammonium_ion

antim

ony

arsenic

barium

brom

ine

cadm

ium

calcium

cerium

cesiu

mchlorine

chromium

cobalt

copper

elem

ental_carbon

europium

gallium

gold

hafnium

indium

iridium iron

lanthanum

lead

magnesiu

mmanganese

mercury

molybdenum

nickel

niobium

nitrate OC

phosphorus

potassium

rubidium

samarium

scandium

selenium

silico

nsilver

sodium

_ion

strontium

sulfate

sulfur

tantalum

terbium tin

titanium

tungsten

vanadium

yttrium zinc

zirconium

Page 15: New England Statistics Symposium 2016

alum

inum

ammonium_ion

antim

ony

arsenic

barium

brom

ine

cadm

ium

calcium

cerium

cesiu

mchlorine

chromium

cobalt

copper

elem

ental_carbon

europium

gallium

gold

hafnium

indium

iridium iron

lanthanum

lead

magnesiu

mmanganese

mercury

molybdenum

nickel

niobium

nitrate OC

phosphorus

potassium

rubidium

samarium

scandium

selenium

silico

nsilver

sodium

_ion

strontium

sulfate

sulfur

tantalum

terbium tin

titanium

tungsten

vanadium

yttrium zinc

zirconium

Page 16: New England Statistics Symposium 2016

Informative Prior Distribution

Page 17: New England Statistics Symposium 2016

Railroad Equipment/Diesel (1999-2008)

Page 18: New England Statistics Symposium 2016

Agricultural Crop/Livestock Dust (1999-2008)

Page 19: New England Statistics Symposium 2016

Annual Source Emissions

Page 20: New England Statistics Symposium 2016

Model Fitting• We use Markov chain Monte Carlo to simulate from

the posterior distribution of the unknown parameters

• Adaptive MCMC approach of Haario et al (2001)

• Use data from SPECIATE and NEI to calibrate the prior distributions

• Constraints placed on profile matrix based on what is known about composition of specific sources

Page 21: New England Statistics Symposium 2016
Page 22: New England Statistics Symposium 2016
Page 23: New England Statistics Symposium 2016

Estimating Health Effects• For individual cities, estimated source time series can

be plugged into regression models with health outcomes

• For multi-site studies, source determination cannot be a manual process (not reproducible)

• Need automatic method to combine information across a region

• Current approaches assume pollution sources are the same everywhere

Page 24: New England Statistics Symposium 2016

US EPA Chemical Speciation Network • 85 monitors, 24 constituentsMedicare cohort (1999—2010) • CVD hospitalizations for 63 counties

Page 25: New England Statistics Symposium 2016

SHARE• A method for estimating health effects of sources

SHared Across a REgion

• Sources are estimated at individual locations and health effects estimated

• Sources are matched across locations via population value decomposition (Crainiceanu et al. 2011)

• Health effects combined for common sources via hierarchical modeling

Page 26: New England Statistics Symposium 2016

SHARE for Source ID

Page 27: New England Statistics Symposium 2016

PM2.5 and CVD Hospitalizations

Page 28: New England Statistics Symposium 2016

Summary• Bayesian source apportionment model can

integrate information from 3 national databases

• Data on sources and profiles can be used to constrain the problem and to construct informative prior distributions

• SHARE method can be used to automatically combine health effects of estimated sources across a region