Upload
taylor-mccall
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Estimating the detector coverage in a negative
selection algorithmZhou Ji
St. Jude Children’s Research Hospital
Dipankar DasguptaThe University of Memphis
GECCO 2005 7/27/2005
outline• Quick review of negative selection
algorithm
• Description of the algorithm (mainly the new strategy to deal with detector coverage)
• Summary
Review of negative selection algorithms
• Biological metaphor: T cells and thymus
• Major steps:1. Generate detector candidates
randomly2. Eliminate those that recognize self
samples
• Major elements of a negative selection algorithm
• Data/detector representation• Matching rule ***• Detector generation algorithm
illustration of 2-D real-valued detectors
Review of negative selection algorithms
• It is in fact a family of algorithms• Different detector representations• Different detector generation mechanisms• Original generation mechanism are not always
used in claimed (and accepted) negative selection algorithms.
• Main characteristics:• Representing target concept using detectors in
negative space• Learning by one-class training data
• Detector coverage/number of detectors• proportion of nonself area that is covered by
detectors
Main idea in this paper• Stop generating detectors when the
coverage is “enough” instead of using a pre-chosen number of detectors
• Some earlier works used similar statistical tools to estimate the necessary number of detectors. It is a totally different approach – though appearing similar because of the similarity in mathematics.
How to deal with detector coverage?
Different possible approaches:
• Decide necessary number before generation.
• Generate enough detectors *** (The real concern is the coverage, not the number.)
• Estimate afterwards, e.g. from the actual detection rate.
Goal from the statistical point of view
• Estimate the parameter (coverage) by sample parameter (proportion – probability)
• Point estimate versus confidence interval
• Two types of statistical inference: estimation versus hypothesis testing
Statistical basics used in this method
• Central limit theory• Sample mean approximately follows
normal distribution
• Hypothesis testing• Testing the hypothesis (e.g. the
coverage is enough) instead of estimating the value of parameter (e.g. coverage)
• Null hypothesis: assumed true unless evidence shows otherwise
• Type I error and type II error: cost • Type I: falsely reject hypothesis - more
costly
Algorithm (V-detector)
Algorithm (V-detector)
Algorithm (V-detector)
Control parameters involved• Self threshold
• What and how much do we know about the training data?
• significant level for hypothesis testing
• target coverage
Issue of integration• Re-use the random points we get
when doing hypothesis testing
• The coverage should not change during hypothesis testing• Require minimum sample size to ensure
that the hypothesis testing is valid.
Experiments
summary• A new negative selection algorithm is
designed.
• Estimation of detector coverage with certain confidence is integrated with the detector generation algorithm.
• The same strategy is extensible to different data presentations, distance measure, or detector generation mechanism.