34
2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash * , John Levander, John Dowling, Bill Hogan, Mike Wagner RODS Laboratory, University of Pittsburgh * Intel Research This research was supported in part by grants from the National Science Foundation (IIS-0325581), the Defense Advanced Research Projects Agency (F30602-01-2-0550), and the Pennsylvania Department of Health (ME-01-737).

2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash *, John Levander, John Dowling,

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

  • Slide 1
  • 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash *, John Levander, John Dowling, Bill Hogan, Mike Wagner RODS Laboratory, University of Pittsburgh * Intel Research This research was supported in part by grants from the National Science Foundation (IIS-0325581), the Defense Advanced Research Projects Agency (F30602-01-2-0550), and the Pennsylvania Department of Health (ME-01-737).
  • Slide 2
  • 2004 University of Pittsburgh Over-the-Counter (OTC) Data Being Collected by the National Retail Data Monitor (NRDM) 19,000 stores 50% market share nationally >70% market share in large cities
  • Slide 3
  • 2004 University of Pittsburgh ED Chief Complaint Data Being Collected by RODS Date / Time AdmittedAgeGenderHome ZipWork ZipChief Complaint Nov 1, 2004 3:0220-30Male15213Shortness of breath Nov 1, 2004 3:0970-80Female1513215213Fever :::::: Chief Complaint ED Records for Allegheny County
  • Slide 4
  • 2004 University of Pittsburgh Objective Using the ED and OTC data streams, detect a disease outbreak in a given region as quickly and accurately as possible
  • Slide 5
  • 2004 University of Pittsburgh Our Approach A unique detection algorithm that models each individual in the population Combines ED and OTC data streams Focuses on detecting an outdoor aerosolized release of an anthrax-like agent in Allegheny county Population-wide ANomaly Detection and Assessment (PANDA)
  • Slide 6
  • 2004 University of Pittsburgh PANDA: Population-wide Anomaly Detection and Assessment Visit of Person to ED Location of Anthrax Release Anthrax Infection of Person Bayesian Network: A graphical model representing the joint probability distribution of a set of random variables Uses a causal Bayesian network Home Location of Person
  • Slide 7
  • 2004 University of Pittsburgh PANDA: Population-wide Anomaly Detection and Assessment The arrows convey conditional independence relationships among the variables. They also represent causal relationships. Uses a causal Bayesian network Visit of Person to ED Location of Anthrax Release Anthrax Infection of Person Home Location of Person
  • Slide 8
  • 2004 University of Pittsburgh Outline 1.Introduction 2.Model 3.Inference 4.Conclusions
  • Slide 9
  • 2004 University of Pittsburgh The Generic PANDA Model for Non-Contagious Diseases Population Risk Factors Population Disease Exposure (PDE) Person Model Population-Wide Evidence Person Model
  • Slide 10
  • 2004 University of Pittsburgh A Special Case of the Generic Model Time of Release Person Model Anthrax Release Location of Release Person Model OTC Sales for Region Each person in the population is represented as a subnetwork in the overall model
  • Slide 11
  • 2004 University of Pittsburgh Location of Release Time Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Acute Respiratory Infection Acute Respiratory Infection Daily OTC Purchase Last 3 Days OTC Purchase Non-ED Acute Respiratory Infection ED Admission The Person Model OTC Sales for Region
  • Slide 12
  • 2004 University of Pittsburgh Why Population Based? 1.Representational power Background knowledge about spatial, temporal, demographic, and symptom information can be coherently represented in a single model Spatial, temporal, demographic, and symptom evidence can be combined to derive a posterior probability of a disease outbreak 2.Representational flexibility New types of knowledge and evidence can be readily incorporated into the model Hypothesis: A population-based approach will achieve better detection performance than non-population- based approaches.
  • Slide 13
  • 2004 University of Pittsburgh Computational Cost of a Population-Wide Approach? ~1.4 million people in Allegheny County, Pennsylvania
  • Slide 14
  • 2004 University of Pittsburgh Equivalence Classes The ~1.4M people in the modeled population can be partitioned into approximately 24,240 equivalence classes
  • Slide 15
  • 2004 University of Pittsburgh Location of Release Time Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Acute Respiratory Infection Acute Respiratory Infection Daily OTC Purchase Last 3 Days OTC Purchase Non-ED Acute Respiratory Infection ED Admission The Person Model OTC Sales for Region
  • Slide 16
  • Location of Release Time Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Acute Respiratory Infection Acute Respiratory Infection Daily OTC Purchase Last 3 Days OTC Purchase Non-ED Acute Respiratory Infection ED Admission The Person Model Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 20-30Male15213YesToday Equivalence Class Example:
  • Slide 17
  • 2004 University of Pittsburgh Outline 1.Introduction 2.Model 3.Inference 4.Conclusions
  • Slide 18
  • 2004 University of Pittsburgh Inference Time of Release Person Model Anthrax Release Location of Release Person Model Derive P (Anthrax Release = true | OTC Sales Data & ED Data) OTC Sales for Region
  • Slide 19
  • 2004 University of Pittsburgh Inference AR = Anthrax ReleaseED = ED Data PDE = Population Disease ExposureOTC = OTC Counts P ( OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE ) Contribution of ED Data Contribution of OTC Counts Key Term in Deriving P ( AR | OTC, ED ) : Details in: Cooper GF, Dash DH, Levander J, Wong W-K, Hogan W, Wagner M. Bayesian Biosurveillance of Disease Outbreaks. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2004.
  • Slide 20
  • 2004 University of Pittsburgh Inference AR = Anthrax ReleaseED = ED Data PDE = Population Disease ExposureOTC = OTC Counts P ( OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE ) The focus of the remainder of this talk Key Term in Deriving P ( AR | OTC, ED ) :
  • Slide 21
  • 2004 University of Pittsburgh The PANDA OTC Model Model the OTC purchases for each Equivalence Class E i as a binomial Distribution. E i ~ Binomial(N E i,P E i )
  • Slide 22
  • 2004 University of Pittsburgh The PANDA OTC Model Model the OTC purchases for each Equivalence Class E i as a binomial Distribution. E i ~ Binomial(N E i,P E i ) Number of people in Equivalence Class E i Probability of an OTC cough medication purchase during the previous 3 days by each person in Equivalence Class E i
  • Slide 23
  • 2004 University of Pittsburgh The PANDA OTC Model Model the OTC purchases for each Equivalence Class E i as a binomial Distribution. Approximate the binomial distribution as a normal distribution. E i ~ Binominal(N E i,P E i ) Normal( E i, 2 E i )
  • Slide 24
  • 2004 University of Pittsburgh The PANDA OTC Model Model the OTC purchases for each Equivalence Class E i as a binomial Distribution. Approximate the binomial distribution as a normal distribution. E i ~ Binominal(N E i,P E i ) Normal( E i, 2 E i ) E i = N E i P E i 2 E i = N E i P E i (1 - P E i )
  • Slide 25
  • 2004 University of Pittsburgh The PANDA OTC Model P (OTC sales = X | ED, PDE ) Recall that: P ( OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE )
  • Slide 26
  • 2004 University of Pittsburgh Example Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Male15213YesToday Equivalence Class 1 ~ Normal(100,100)
  • Slide 27
  • 2004 University of Pittsburgh Example Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Male15213YesToday Equivalence Class 1 ~ Normal(100,100) Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Female15213YesToday Equivalence Class 2 ~ Normal(150,225)
  • Slide 28
  • 2004 University of Pittsburgh Example Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Male15213YesToday Equivalence Class 1 ~ Normal(100,100) Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Female15213YesToday Equivalence Class 2 ~ Normal(150,225) If these were the only 2 Equivalence Classes in the County then County Cough & Cold OTC ~ Normal(100+150,100+225)
  • Slide 29
  • 2004 University of Pittsburgh Example Now suppose 260 units are sold in the county P( OTC Sales = 260 | ED Data, PDE ) = Normal( 260; 250, 325 ) = 0.001231 260
  • Slide 30
  • 2004 University of Pittsburgh Inference Timing Machine: P4 3 Gigahertz, 2 GB RAM Initialization Time (seconds) Each hour of data (seconds) ED model555 ED and OTC model 2295
  • Slide 31
  • 2004 University of Pittsburgh Outline 1.Introduction 2.Model 3.Inference 4.Conclusions
  • Slide 32
  • 2004 University of Pittsburgh Challenges in Population-Wide Modeling Include Obtaining good parameter estimates to use in modeling (e.g., the probability of an OTC cough medication purchase given an acute respiratory illness) Modeling time and space in a way that is both useful and computationally tractable Modeling contagious diseases
  • Slide 33
  • 2004 University of Pittsburgh Conclusions PANDA is a multivariate algorithm that can combine multiple data streams Modeling each individual in the population is computationally feasible An evaluation of this approach using simulations is in progress
  • Slide 34
  • 2004 University of Pittsburgh Thank you http://www.cbmi.pitt.edu/panda/