16
Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Children’s Research Hospital Dipankar Dasgupta The University of Memphis GECCO 2005 7/27/2005

Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Embed Size (px)

Citation preview

Page 1: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Estimating the detector coverage in a negative

selection algorithmZhou Ji

St. Jude Children’s Research Hospital

Dipankar DasguptaThe University of Memphis

GECCO 2005 7/27/2005

Page 2: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

outline• Quick review of negative selection

algorithm

• Description of the algorithm (mainly the new strategy to deal with detector coverage)

• Summary

Page 3: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Review of negative selection algorithms

• Biological metaphor: T cells and thymus

• Major steps:1. Generate detector candidates

randomly2. Eliminate those that recognize self

samples

• Major elements of a negative selection algorithm

• Data/detector representation• Matching rule ***• Detector generation algorithm

Page 4: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

illustration of 2-D real-valued detectors

Page 5: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Review of negative selection algorithms

• It is in fact a family of algorithms• Different detector representations• Different detector generation mechanisms• Original generation mechanism are not always

used in claimed (and accepted) negative selection algorithms.

• Main characteristics:• Representing target concept using detectors in

negative space• Learning by one-class training data

• Detector coverage/number of detectors• proportion of nonself area that is covered by

detectors

Page 6: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Main idea in this paper• Stop generating detectors when the

coverage is “enough” instead of using a pre-chosen number of detectors

• Some earlier works used similar statistical tools to estimate the necessary number of detectors. It is a totally different approach – though appearing similar because of the similarity in mathematics.

Page 7: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

How to deal with detector coverage?

Different possible approaches:

• Decide necessary number before generation.

• Generate enough detectors *** (The real concern is the coverage, not the number.)

• Estimate afterwards, e.g. from the actual detection rate.

Page 8: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Goal from the statistical point of view

• Estimate the parameter (coverage) by sample parameter (proportion – probability)

• Point estimate versus confidence interval

• Two types of statistical inference: estimation versus hypothesis testing

Page 9: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Statistical basics used in this method

• Central limit theory• Sample mean approximately follows

normal distribution

• Hypothesis testing• Testing the hypothesis (e.g. the

coverage is enough) instead of estimating the value of parameter (e.g. coverage)

• Null hypothesis: assumed true unless evidence shows otherwise

• Type I error and type II error: cost • Type I: falsely reject hypothesis - more

costly

Page 10: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Algorithm (V-detector)

Page 11: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Algorithm (V-detector)

Page 12: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Algorithm (V-detector)

Page 13: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Control parameters involved• Self threshold

• What and how much do we know about the training data?

• significant level for hypothesis testing

• target coverage

Page 14: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Issue of integration• Re-use the random points we get

when doing hypothesis testing

• The coverage should not change during hypothesis testing• Require minimum sample size to ensure

that the hypothesis testing is valid.

Page 15: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Experiments

Page 16: Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

summary• A new negative selection algorithm is

designed.

• Estimation of detector coverage with certain confidence is integrated with the detector generation algorithm.

• The same strategy is extensible to different data presentations, distance measure, or detector generation mechanism.