2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong,...
34
2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash * , John Levander, John Dowling, Bill Hogan, Mike Wagner RODS Laboratory, University of Pittsburgh * Intel Research This research was supported in part by grants from the National Science Foundation (IIS-0325581), the Defense Advanced Research Projects Agency (F30602-01-2-0550), and the Pennsylvania Department of Health (ME-01-737).
2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash *, John Levander, John Dowling,
2004 University of Pittsburgh Bayesian Biosurveillance Using
Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash *,
John Levander, John Dowling, Bill Hogan, Mike Wagner RODS
Laboratory, University of Pittsburgh * Intel Research This research
was supported in part by grants from the National Science
Foundation (IIS-0325581), the Defense Advanced Research Projects
Agency (F30602-01-2-0550), and the Pennsylvania Department of
Health (ME-01-737).
Slide 2
2004 University of Pittsburgh Over-the-Counter (OTC) Data Being
Collected by the National Retail Data Monitor (NRDM) 19,000 stores
50% market share nationally >70% market share in large
cities
Slide 3
2004 University of Pittsburgh ED Chief Complaint Data Being
Collected by RODS Date / Time AdmittedAgeGenderHome ZipWork
ZipChief Complaint Nov 1, 2004 3:0220-30Male15213Shortness of
breath Nov 1, 2004 3:0970-80Female1513215213Fever :::::: Chief
Complaint ED Records for Allegheny County
Slide 4
2004 University of Pittsburgh Objective Using the ED and OTC
data streams, detect a disease outbreak in a given region as
quickly and accurately as possible
Slide 5
2004 University of Pittsburgh Our Approach A unique detection
algorithm that models each individual in the population Combines ED
and OTC data streams Focuses on detecting an outdoor aerosolized
release of an anthrax-like agent in Allegheny county
Population-wide ANomaly Detection and Assessment (PANDA)
Slide 6
2004 University of Pittsburgh PANDA: Population-wide Anomaly
Detection and Assessment Visit of Person to ED Location of Anthrax
Release Anthrax Infection of Person Bayesian Network: A graphical
model representing the joint probability distribution of a set of
random variables Uses a causal Bayesian network Home Location of
Person
Slide 7
2004 University of Pittsburgh PANDA: Population-wide Anomaly
Detection and Assessment The arrows convey conditional independence
relationships among the variables. They also represent causal
relationships. Uses a causal Bayesian network Visit of Person to ED
Location of Anthrax Release Anthrax Infection of Person Home
Location of Person
Slide 8
2004 University of Pittsburgh Outline 1.Introduction 2.Model
3.Inference 4.Conclusions
Slide 9
2004 University of Pittsburgh The Generic PANDA Model for
Non-Contagious Diseases Population Risk Factors Population Disease
Exposure (PDE) Person Model Population-Wide Evidence Person
Model
Slide 10
2004 University of Pittsburgh A Special Case of the Generic
Model Time of Release Person Model Anthrax Release Location of
Release Person Model OTC Sales for Region Each person in the
population is represented as a subnetwork in the overall model
Slide 11
2004 University of Pittsburgh Location of Release Time Of
Release Anthrax Infection Home Zip Respiratory from Anthrax Other
ED Disease Gender Age Decile Respiratory CC From Other Respiratory
CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from
Other ED Acute Respiratory Infection Acute Respiratory Infection
Daily OTC Purchase Last 3 Days OTC Purchase Non-ED Acute
Respiratory Infection ED Admission The Person Model OTC Sales for
Region
Slide 12
2004 University of Pittsburgh Why Population Based?
1.Representational power Background knowledge about spatial,
temporal, demographic, and symptom information can be coherently
represented in a single model Spatial, temporal, demographic, and
symptom evidence can be combined to derive a posterior probability
of a disease outbreak 2.Representational flexibility New types of
knowledge and evidence can be readily incorporated into the model
Hypothesis: A population-based approach will achieve better
detection performance than non-population- based approaches.
Slide 13
2004 University of Pittsburgh Computational Cost of a
Population-Wide Approach? ~1.4 million people in Allegheny County,
Pennsylvania
Slide 14
2004 University of Pittsburgh Equivalence Classes The ~1.4M
people in the modeled population can be partitioned into
approximately 24,240 equivalence classes
Slide 15
2004 University of Pittsburgh Location of Release Time Of
Release Anthrax Infection Home Zip Respiratory from Anthrax Other
ED Disease Gender Age Decile Respiratory CC From Other Respiratory
CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from
Other ED Acute Respiratory Infection Acute Respiratory Infection
Daily OTC Purchase Last 3 Days OTC Purchase Non-ED Acute
Respiratory Infection ED Admission The Person Model OTC Sales for
Region
Slide 16
Location of Release Time Of Release Anthrax Infection Home Zip
Respiratory from Anthrax Other ED Disease Gender Age Decile
Respiratory CC From Other Respiratory CC Respiratory CC When
Admitted ED Admit from Anthrax ED Admit from Other ED Acute
Respiratory Infection Acute Respiratory Infection Daily OTC
Purchase Last 3 Days OTC Purchase Non-ED Acute Respiratory
Infection ED Admission The Person Model Age Decile GenderHome Zip
Respiratory Chief Comp. Date Admitted 20-30Male15213YesToday
Equivalence Class Example:
Slide 17
2004 University of Pittsburgh Outline 1.Introduction 2.Model
3.Inference 4.Conclusions
Slide 18
2004 University of Pittsburgh Inference Time of Release Person
Model Anthrax Release Location of Release Person Model Derive P
(Anthrax Release = true | OTC Sales Data & ED Data) OTC Sales
for Region
Slide 19
2004 University of Pittsburgh Inference AR = Anthrax ReleaseED
= ED Data PDE = Population Disease ExposureOTC = OTC Counts P (
OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE ) Contribution
of ED Data Contribution of OTC Counts Key Term in Deriving P ( AR |
OTC, ED ) : Details in: Cooper GF, Dash DH, Levander J, Wong W-K,
Hogan W, Wagner M. Bayesian Biosurveillance of Disease Outbreaks.
In: Proceedings of the Conference on Uncertainty in Artificial
Intelligence, 2004.
Slide 20
2004 University of Pittsburgh Inference AR = Anthrax ReleaseED
= ED Data PDE = Population Disease ExposureOTC = OTC Counts P (
OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE ) The focus of
the remainder of this talk Key Term in Deriving P ( AR | OTC, ED )
:
Slide 21
2004 University of Pittsburgh The PANDA OTC Model Model the OTC
purchases for each Equivalence Class E i as a binomial
Distribution. E i ~ Binomial(N E i,P E i )
Slide 22
2004 University of Pittsburgh The PANDA OTC Model Model the OTC
purchases for each Equivalence Class E i as a binomial
Distribution. E i ~ Binomial(N E i,P E i ) Number of people in
Equivalence Class E i Probability of an OTC cough medication
purchase during the previous 3 days by each person in Equivalence
Class E i
Slide 23
2004 University of Pittsburgh The PANDA OTC Model Model the OTC
purchases for each Equivalence Class E i as a binomial
Distribution. Approximate the binomial distribution as a normal
distribution. E i ~ Binominal(N E i,P E i ) Normal( E i, 2 E i
)
Slide 24
2004 University of Pittsburgh The PANDA OTC Model Model the OTC
purchases for each Equivalence Class E i as a binomial
Distribution. Approximate the binomial distribution as a normal
distribution. E i ~ Binominal(N E i,P E i ) Normal( E i, 2 E i ) E
i = N E i P E i 2 E i = N E i P E i (1 - P E i )
Slide 25
2004 University of Pittsburgh The PANDA OTC Model P (OTC sales
= X | ED, PDE ) Recall that: P ( OTC, ED | PDE ) = P ( OTC | ED,
PDE ) P ( ED | PDE )
Slide 26
2004 University of Pittsburgh Example Age Decile GenderHome Zip
Respiratory Chief Comp. Date Admitted 50-60Male15213YesToday
Equivalence Class 1 ~ Normal(100,100)
Slide 27
2004 University of Pittsburgh Example Age Decile GenderHome Zip
Respiratory Chief Comp. Date Admitted 50-60Male15213YesToday
Equivalence Class 1 ~ Normal(100,100) Age Decile GenderHome Zip
Respiratory Chief Comp. Date Admitted 50-60Female15213YesToday
Equivalence Class 2 ~ Normal(150,225)
Slide 28
2004 University of Pittsburgh Example Age Decile GenderHome Zip
Respiratory Chief Comp. Date Admitted 50-60Male15213YesToday
Equivalence Class 1 ~ Normal(100,100) Age Decile GenderHome Zip
Respiratory Chief Comp. Date Admitted 50-60Female15213YesToday
Equivalence Class 2 ~ Normal(150,225) If these were the only 2
Equivalence Classes in the County then County Cough & Cold OTC
~ Normal(100+150,100+225)
Slide 29
2004 University of Pittsburgh Example Now suppose 260 units are
sold in the county P( OTC Sales = 260 | ED Data, PDE ) = Normal(
260; 250, 325 ) = 0.001231 260
Slide 30
2004 University of Pittsburgh Inference Timing Machine: P4 3
Gigahertz, 2 GB RAM Initialization Time (seconds) Each hour of data
(seconds) ED model555 ED and OTC model 2295
Slide 31
2004 University of Pittsburgh Outline 1.Introduction 2.Model
3.Inference 4.Conclusions
Slide 32
2004 University of Pittsburgh Challenges in Population-Wide
Modeling Include Obtaining good parameter estimates to use in
modeling (e.g., the probability of an OTC cough medication purchase
given an acute respiratory illness) Modeling time and space in a
way that is both useful and computationally tractable Modeling
contagious diseases
Slide 33
2004 University of Pittsburgh Conclusions PANDA is a
multivariate algorithm that can combine multiple data streams
Modeling each individual in the population is computationally
feasible An evaluation of this approach using simulations is in
progress
Slide 34
2004 University of Pittsburgh Thank you
http://www.cbmi.pitt.edu/panda/