Bio Surveillance 2.0 Kass-Hout and Di Tada

  • Upload
    nditada

  • View
    225

  • Download
    0

Embed Size (px)

Citation preview

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    1/56

    Biosurveillance 2.0Collaboration for Early Disease

    Warning and EffectiveResponse

    Taha Kass-Hout

    Nicols di Tada

    Invited by Dr. Barbara Massoudi, PhD, MPHLecture at Emory University Rollins School of Public Health

    Public Health Informatics, INFO 503

    Atlanta, GA, USA

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    2/56

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    3/56

    DAY

    CASES

    Opportunityfor control

    Background

    Late Detection Response

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    4/56

    DAY

    CASES

    Opportunityfor control

    Background

    Early Detection andResponse

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    5/56

    PUBLIC HEALTH MEASURES

    Representativeness

    Completeness

    Predictive Value

    Timeliness

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    6/56

    PUBLIC HEALTH MEASURES

    1000 Malariainfections (100%)

    50 Malarianotifications (5%)

    Get as close to thebottom of the pyramid

    as possible

    Urge frequent reporting:Weekly daily immediately

    Specificity /Reliability

    Sensitivity /Timeliness Main attributes

    o Representativenesso Completenesso

    Predictive value positiveBackground

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    7/56

    Analyze and interpret

    Signal as

    earlyas possible

    Automated analysis/thresholds

    Time

    Main attributeso Timeliness

    PUBLIC HEALTH MEASURES

    Health care hotline

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    8/56

    PUBLIC HEALTH TWOPERSPECTIVES

    Case management Individual cases of notifiable

    diseases

    Relationship networks (contacttracing)

    Population surveillance Larger risk patterns

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    9/56

    CASE MANAGEMENT

    Questions/problems:

    Is a case due to recent transmission?

    If so, does the case share any feature

    with other, recent cases?

    Ways it's being done:

    Investigations/interviews Meeting with other investigators

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    10/56

    POPULATION SURVEILLANCE

    Questions/problems:

    Are more cases happening than expected?

    Does an excess suggest ongoing

    transmission in a specific region?

    Way it's being done:

    Semi-automated routine temporal and

    space-time statistical analysis

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    11/56

    WHY LOCATION MATTERS CASE MANAGEMENT

    If you are studying a case of acertain disease that was justdeclared

    It is harder to picture thesituation by looking at somethingas this..

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    12/56

    WHY LOCATION MATTERS CASE MANAGEMENT

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    13/56

    WHY LOCATION MATTERS CASE MANAGEMENT

    Than by looking at this..

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    14/56

    WHY LOCATION MATTERS CASE MANAGEMENT

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    15/56

    WHY LOCATION MATTERS POP SURVEILLANCE

    If you are studying the spatialdistribution of a set of diseaseclusters

    This would seem more difficult..

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    16/56

    WHY LOCATION MATTERS POP SURVEILLANCE

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    17/56

    WHY LOCATION MATTERS POP SURVEILLANCE

    Than this..

    Background

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    18/56

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    19/56

    The Problem Space

    Current systems design,analysis and evaluationhas been gearedtowards specific data

    sources and detectionalgorithms nothumans

    We have systems in

    place for those threatswe have been facedwith before

    The Problem

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    20/56

    Traditional DISEASESURVEILLANCE

    In the past two decades focus was on

    automatically detecting anomalouspatterns in data (often a single stream)

    Modern methods

    rely on human input and judgment

    incorporate temporal, spatial, andmultivariate information

    The Problem

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    21/56

    9/20, 15213, cough/cold,

    9/21, 15207, antifever, 9/22, 15213, CC = cough, ...

    1,000,000 more records

    Huge mass of data Detection algorithm What are we

    supposed to do with

    this?

    Too many

    alerts

    Traditional DISEASESURVEILLANCE

    The Problem

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    22/56

    Our Approach

    Human-based

    Collaborative and cross-disciplinary

    Web 2.0/3.0 platform

    Our Approach

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    23/56

    Information Sources

    Event-based - ad-hocunstructured reportsissued by formal orinformal sources

    Indicator-based -(number of cases,rates, proportion of

    strains)

    Timeliness, Representativeness, Completeness,

    Predictive Value, Quality, Our Approach

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    24/56

    9/20, 15213, cough/cold,

    9/21, 15207, antifever, 9/22, 15213, CC = cough, ...

    1,000,000 more records

    Huge mass of dataFeedback loop

    MODERN DISEASESURVEILLANCE

    Our Approach

    M i C t

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    25/56

    Main Components

    Feature extraction, reference andbaseline information

    Tags

    Multiple Data Streams

    User-Generated and Machine LearningMetadata

    Comments

    Spatio-temporal

    Flags/Alerts/Bookmarks

    EventClassification,

    Characterizationand Detection

    Previous Event Training Data

    Previous Event Control Data

    Metadataextraction

    Machinelearning

    Social network

    Professionalfeedback

    Anomalydetection

    Collaborative Spaces

    Hypotheses generation\testing

    Our Solution

    Main Components

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    26/56

    Main Components

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    27/56

    Item

    Hypothesis

    Field Actions and

    Verifications

    Feedback /

    Confirmation

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    28/56

    ADVANTAGES OF MACHINELEARNING

    P(malaria) = 22%P(influenza) = 13%

    P(other ILI) = 33%

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    29/56

    MACHINE LEARNINGTECHNIQUES

    Classifiers

    Clustering

    Bayesian Statistics Neural Networks

    Genetic Algorithms

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    30/56

    HOW TO REPRESENT ADOCUMENT?

    cold

    fever

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    31/56

    CLASSIFIERS PROBLEMDEFINITION

    Map items to vectors (Featureextraction)

    Normalize those vectors

    Train the classifier

    Measure the results with new

    information Feedback the classifier

    Separate classes in feature space

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    32/56

    CLASSIFIERS - SVM

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    33/56

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    34/56

    SVM NON LINEAR?

    : x(x)

    Map to higher-dimension space

    Our Solution

    SVM FILTERING OR

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    35/56

    SVM FILTERING ORCLASSIFYING

    Classifier

    Document1

    Document2

    Document3

    Positives

    Negatives

    TrainingDocument

    TrainingDocument

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    36/56

    CLUSTERING PROBLEMDEFINITION

    Map items to vectors (Featureextraction)

    Normalization

    Agglomerative and Partitional

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    37/56

    CLUSTERING -AGGLOMERATIVE

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    38/56

    CLUSTERING - PARTITIONAL

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    39/56

    BAYESIAN STATISTICS

    P(A|B)=P(B|A).P(A)

    P(B)

    Probability ofdisease A (flu)

    once symptomsB (fever) are

    observed

    Probability offever once fluis confirmed

    Probability offlu (prior ormarginal)

    Probability offever (prioror marginal)

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    40/56

    NEURAL NETWORKS

    Given a set of stimulus, train asystem to produce a given output

    Our Solution

    NEURAL NETWORKS

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    41/56

    Hidden Layer

    Output Layer

    Input Layer

    NEURAL NETWORKS -STRUCTURE

    []

    []

    {I0,I1,In}

    {O0,O1,On}

    Weight

    Hn= (I

    i.

    i=0

    I

    win)

    Our Solution

    NEURAL NETWORK

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    42/56

    NEURAL NETWORK -APPLICATION

    Event?

    Our Solution

    GENETIC ALGORITHM

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    43/56

    GENETIC ALGORITHM -BASICS

    Define the model that you wantto optimize

    Create the fitness function

    Evolve the gene pool testingagainst the fitness function.

    Select the best individual

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    44/56

    GENETIC ALGORITHM

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    45/56

    GENETIC ALGORITHM MODEL FITNESS

    Fitness = 1/Area

    Our Solution

    GENETIC ALGORITHM

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    46/56

    GENETIC ALGORITHM PROCESS

    1.Create an initial population ofcandidates

    2.Use operators to generate new

    candidates (mating and mutation)3.Discard worst individuals or select best

    individuals in generation

    4.Repeat from 2 until you find a

    candidate that satisfies the solutionsearched

    Our Solution

    GENETIC ALGORITHM

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    47/56

    (4,5,6,3,5)(4,3,6,2,5)

    GENETIC ALGORITHM -PROCESS

    (5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)

    (2,3,4,6,5) (3,4,5,2,6)

    (3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)

    (4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)

    (5,3,2,6,5)

    (3,4,4,6,2)

    (5,3,2,6,5)

    (3,4,4,6,2)

    Our Solution

    RESULTS IMPROVED

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    48/56

    RESULTS IMPROVEDSURVEILLANCE

    Our Solution

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    49/56

    Our Solution

    InSTEDD Evolve

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    50/56

    Our Solution

    InSTEDD Evolve

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    51/56

    Our Solution

    InSTEDD Evolve

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    52/56

    Our Solution

    InSTEDD Evolve

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    53/56

    Our Solution

    InSTEDD Evolve

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    54/56

    Acknowledgement

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    55/56

  • 8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada

    56/56