23
Population-Wide Anomaly Detection Weng-Keen Wong 1 , Gregory Cooper 2 , Denver Dash 3 , John Levander 2 , John Dowling 2 , Bill Hogan 2 , Michael Wagner 2 1 School of Electrical Engineering and Computer Science, Oregon State University, 2 Realtime Outbreak and Disease Surveillance Laboratory, University of Pittsburgh, 3 Intel Research, Santa Clara

Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Population-Wide Anomaly Detection

Weng-Keen Wong1, Gregory Cooper2, Denver Dash3, John Levander2, John

Dowling2, Bill Hogan2, Michael Wagner2

1School of Electrical Engineering and Computer Science, Oregon State University, 2Realtime Outbreak and Disease Surveillance Laboratory, University

of Pittsburgh, 3Intel Research, Santa Clara

Page 2: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Motivation

• Suppose you monitor Emergency Department (ED) data which arrives in realtime

• Can you specifically detect a large scale anthrax attack?

Date / Time Admitted

Age Gender Home Zip Chief Complaint

Aug 1, 2005 3:02 20-30 Male 15213 Shortness of breath

Aug 1, 2005 3:07 40-50 Male 15146 Diarrhea

Aug 1, 2004 3:09 70-80 Female 15132 Fever

: : : : :

Page 3: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Model non-outbreak conditions and notice deviations

Number of ED Respiratory Cases vs. Time

0

10

20

30

40

50

60

70

12/30/2000 1/4/2001 1/9/2001 1/14/2001 1/19/2001 1/24/2001 1/29/2001 2/3/2001

Date

Nu

mb

er o

f E

D R

esp

irat

ory

Cas

es

Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models

Spatial methods eg. Spatial Scan Statistic

Multivariate methods eg. WSARE

2. Sat 2001-03-13: SCORE = -0.00000464 PVALUE = 0.00000000 12.42% ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True 6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True

Page 4: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Model non-outbreak conditions and notice deviations

Number of ED Respiratory Cases vs. Time

0

10

20

30

40

50

60

70

12/30/2000 1/4/2001 1/9/2001 1/14/2001 1/19/2001 1/24/2001 1/29/2001 2/3/2001

Date

Nu

mb

er o

f E

D R

esp

irat

ory

Cas

es

Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models

Spatial methods eg. Spatial Scan Statistic

Multivariate methods eg. WSARE

2. Sat 2001-03-13: SCORE = -0.00000464 PVALUE = 0.00000000 12.42% ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True 6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True

These are non-specific methods – they look

for anything unusual in

the data but not

specifically for th

e onset of an anthrax attack.

Page 5: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Population-wide ANomaly Detection and Assessment (PANDA)

• A detector specifically for a large-scale outdoor release of inhalational anthrax

• Uses a massive causal Bayesian network

• Population-wide approach: each person in the population is represented as a subnetwork in the overall model

Page 6: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Population-Wide Approach

• Note the conditional independence assumptions

• Anthrax is infectious but non-contagious

Time of Release

Person Model

Anthrax Release

Location of Release

Person Model

Global nodes

Interface nodes

Each person in the population

Person Model

Page 7: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Population-Wide Approach

• Structure designed by expert judgment• Parameters obtained from census data, training

data, and expert assessments informed by literature and experience

Time of Release

Person Model

Anthrax Release

Location of Release

Person Model

Global nodes

Interface nodes

Each person in the population

Person Model

Page 8: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Person Model (Initial Prototype)Anthrax Release

Location of ReleaseTime Of Release

Anthrax Infection

Home Zip

Respiratory from Anthrax

Other ED Disease

GenderAge Decile

Respiratory CCFrom Other

RespiratoryCC

Respiratory CCWhen Admitted

ED Admitfrom Anthrax

ED Admit from Other

ED Admission

Anthrax Infection

Home Zip

Respiratory from Anthrax

Other ED Disease

Gender

Age Decile

Respiratory CCFrom Other

RespiratoryCC

Respiratory CCWhen Admitted

ED Admitfrom Anthrax

ED Admit from Other

ED Admission

……

Page 9: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Person Model (Initial Prototype)Anthrax Release

Location of ReleaseTime Of Release

Anthrax Infection

Home Zip

Respiratory from Anthrax

Other ED Disease

GenderAge Decile

Respiratory CCFrom Other

RespiratoryCC

Respiratory CCWhen Admitted

ED Admitfrom Anthrax

ED Admit from Other

ED Admission

Anthrax Infection

Home Zip

Respiratory from Anthrax

Other ED Disease

Gender

Age Decile

Respiratory CCFrom Other

RespiratoryCC

Respiratory CCWhen Admitted

ED Admitfrom Anthrax

ED Admit from Other

ED Admission

……

Yesterday never

False

15213

20-30Female

Unknown

15146

50-60 Male

Page 10: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Prototype is Computationally Feasible

Aside from caching tricks, there are two main optimizations:

• Incremental Updating• Equivalence Classes

Performance:

On P4 3.0 Ghz machine, 2 GB RAM, 45 seconds of initialization time, 3 seconds for each hour’s worth of ED data

See Cooper G.F., Dash D.H., Levander J.D., Wong W-K, Hogan W. R., Wagner M. M. Bayesian Biosurveillance of Disease Outbreaks. In Proceedings of the 20th Conference on UAI. Banff, Canada: AUAI Press; 2004. pp94-103.

Page 11: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

What do you gain with a population-wide approach?

Coherent framework for:

1. Incorporating background knowledge

2. Incorporating different types of evidence

3. Data fusion

4. Explanation

Page 12: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

1. Incorporating Background Knowledge

• Limited data from actual anthrax attacks available:– Postal attacks 2001 (Only 11 people affected,

not representative of a large scale attack)– Sverdlovsk 1979

• But literature contains studies on the characteristics of inhalational anthrax

Page 13: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

1. Incorporating Background Knowledge

Can coherently incorporate different types of background knowledge eg. for inhalational anthrax:

• Progression of symptoms

• Incubation period

• Spatial dispersion pattern

Page 14: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

1. Incorporating Background Knowledge

Can coherently incorporate different types of background knowledge eg. for inhalational anthrax:

• Progression of symptoms

• Incubation period

• Spatial dispersion pattern

At an individual level

Page 15: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

1. Incorporating Background Knowledge

Can coherently incorporate different types of background knowledge eg. for inhalational anthrax:

• Progression of symptoms

• Incubation period

• Spatial dispersion pattern

Can represent this by the effects over individuals

Page 16: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

2. Incorporating Evidence

• Easily incorporate different types of evidence eg. spatial, temporal, demographic, symptomatic

• Easily incorporate new evidence that distinguishes an individual (or individuals) from others– Modify the appropriate person model

Page 17: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

3. Data Fusion

Date / Time Admitted

Age Gender Home Zip Chief Complaint

Aug 1, 2005 3:02

20-30 Male 15213 Shortness of breath

Aug 1, 2005 3:07

40-50 Male 15146 Diarrhea

Aug 1, 2004 3:09

70-80 Female 15132 Fever

: : : : :

ED data OTC data

• No data available during an actual anthrax attack that captures the correlation between these two data sources.

• By modeling the actions of individuals, and incorporating background knowledge, we can come up with a plausible model of the effects of an attack on these two data sources.

Page 18: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

3. Data Fusion

Date / Time Admitted

Age Gender Home Zip Chief Complaint

Aug 1, 2005 3:02

20-30 Male 15213 Shortness of breath

Aug 1, 2005 3:07

40-50 Male 15146 Diarrhea

Aug 1, 2004 3:09

70-80 Female 15132 Fever

: : : : :

ED data OTC data

OTC data – aggregated over zipcode and available daily

ED data – individual patient records, available usually in

real-time

Page 19: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

3. Data Fusion

Date / Time Admitted

Age Gender Home Zip Chief Complaint

Aug 1, 2005 3:02

20-30 Male 15213 Shortness of breath

Aug 1, 2005 3:07

40-50 Male 15146 Diarrhea

Aug 1, 2004 3:09

70-80 Female 15132 Fever

: : : : :

ED data OTC data

By representing at the finest granularity (ie. each individual), we can easily deal with different spatial and temporal granularity in data fusion.

See Wong, W-K, Cooper G.F., Dash D.H., Dowling, J.N., Levander J.D., Hogan W. R., Wagner M. M. Bayesian Biosurveillance Using Multiple Data Streams. In Proceedings of the 3rd National Syndromic Surveillance Conference, 2004.

Page 20: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

4. Explanation

• Important to know why the model believes an anthrax attack is occurring

• Can find the subset of evidence E* that most influences such a belief

• In PANDA, E* would correspond to a group of individuals

• Identify the individuals that most contribute to the hypothesis of an attack

Page 21: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

4. Explanation

Can also use the Bayesian network to calculate the most likely location of release and time of release

Currently, we identify the top equivalence classes that contribute the most to the hypothesis that an attack is occurring

Gender Age Decile

Home Zip

Respiratory Symptoms

Date Admitted

M 20-30 15213 True 2 days ago

Gender Age Decile

Home Zip

Respiratory Symptoms

Date Admitted

F 20-30 15213 True 2 days ago

Gender Age Decile

Home Zip

Respiratory Symptoms

Date Admitted

M 30-40 15213 True 2 days ago

Gender Age Decile

Home Zip

Respiratory Symptoms

Date Admitted

F 40-50 15213 True 2 days ago

Page 22: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Future Work

• More sophisticated person models

• Improved explanation capabilities

• Validation of data fusion model

• More disease models apart from anthrax

• Contagious disease models

• Combining outputs from multiple Bayesian detectors

Page 23: Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1

Thank You!

RODS Laboratory: http://rods.health.pitt.edu

Bayesian Biosurveillance:http://www.cbmi.pitt.edu/panda/

This research was supported by grants IIS-0325581 from the National Science Foundation, F30602-01-2-0550 from the Department of Homeland Security, and ME-01-737 from the Pennsylvania Department of Health.