Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis

Estimating the detector coverage in a negative

selection algorithmZhou Ji

St. Jude Children’s Research Hospital

Dipankar DasguptaThe University of Memphis

GECCO 2005 7/27/2005

outline• Quick review of negative selection

algorithm

• Description of the algorithm (mainly the new strategy to deal with detector coverage)

• Summary

Review of negative selection algorithms

• Biological metaphor: T cells and thymus

• Major steps:1. Generate detector candidates

randomly2. Eliminate those that recognize self

samples

• Major elements of a negative selection algorithm

• Data/detector representation• Matching rule ***• Detector generation algorithm

illustration of 2-D real-valued detectors

Review of negative selection algorithms

• It is in fact a family of algorithms• Different detector representations• Different detector generation mechanisms• Original generation mechanism are not always

used in claimed (and accepted) negative selection algorithms.

• Main characteristics:• Representing target concept using detectors in

negative space• Learning by one-class training data

• Detector coverage/number of detectors• proportion of nonself area that is covered by

detectors

Main idea in this paper• Stop generating detectors when the

coverage is “enough” instead of using a pre-chosen number of detectors

• Some earlier works used similar statistical tools to estimate the necessary number of detectors. It is a totally different approach – though appearing similar because of the similarity in mathematics.

How to deal with detector coverage?

Different possible approaches:

• Decide necessary number before generation.

• Generate enough detectors *** (The real concern is the coverage, not the number.)

• Estimate afterwards, e.g. from the actual detection rate.

Goal from the statistical point of view

• Estimate the parameter (coverage) by sample parameter (proportion – probability)

• Point estimate versus confidence interval

• Two types of statistical inference: estimation versus hypothesis testing

Statistical basics used in this method

• Central limit theory• Sample mean approximately follows

normal distribution

• Hypothesis testing• Testing the hypothesis (e.g. the

coverage is enough) instead of estimating the value of parameter (e.g. coverage)

• Null hypothesis: assumed true unless evidence shows otherwise

• Type I error and type II error: cost • Type I: falsely reject hypothesis - more

costly

Algorithm (V-detector)



Control parameters involved• Self threshold

• What and how much do we know about the training data?

• significant level for hypothesis testing

• target coverage

Issue of integration• Re-use the random points we get

when doing hypothesis testing

• The coverage should not change during hypothesis testing• Require minimum sample size to ensure

that the hypothesis testing is valid.

Experiments

summary• A new negative selection algorithm is

designed.

• Estimation of detector coverage with certain confidence is integrated with the detector generation algorithm.

• The same strategy is extensible to different data presentations, distance measure, or detector generation mechanism.

Documents

Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis