Evolutionary Computation Genetic Algorithms Genetic Programming Learning Classifier Systems

Evolutionary Computation

Genetic Algorithms

Genetic Programming

Learning Classifier Systems

Genetic Algorithms• Population-based technique for discovery

of knowledge structures

• Based on idea that evolution represents search for optimum solution set

• Massively parallel

The Vocabulary of GAs• Population

– Set of individuals, each represented by one or more strings of characters

• Chromosome– The string representing an individual

The vocabulary of GAs, contd.• Gene

– The basic informational unit on a chromosome

• Allele– The value of a specific gene

• Locus– The ordinal place on a chromosome where a

specific gene is found

Thus...

011010

Chromosome

Gene(Allele="0")

Locus=5

Genetic operators

• Reproduction– Increase representations of strong individuals

• Crossover– Explore the search space

• Mutation– Recapture “lost” genes due to crossover

Genetic operators illustrated...

011010Parent 1:

Parent 2:

Offspring 1:

Offspring 2:000110011010000110

Simple reproduction

011010 Offspring 1:Offspring 2:

000110011110000010

Reproduction with crossover at locus 3

011010 Offspring 1:

Offspring 2:000110010010000110

Simple reproduction with mutation at locus 3 for offspring 1

Parent 1:

Parent 2:

Parent 1:

Parent 2:

GAs rely on the concept of “fitness”

• Ability of an individual to survive into the next generation

• “Survival of the fittest”

• Usually calculated in terms of an objective fitness function– Maximization– Minimization– Other functions

Genetic Programming

• Based on adaptation and evolution

• Structures undergoing adaptation are computer programs of varying size and shape

• Computer programs are genetically “bred” over time

The Learning Classifier System

• Rule-based knowledge discovery and concept learning tool

• Operates by means of evaluation, credit assignment, and discovery applied to a population of “chromosomes” (rules) each with a corresponding “phenotype” (outcome)

Components of a Learning Classifier System

• Performance– Provides interaction between environment and rule base

– Performs matching function

• Reinforcement– Rewards accurate classifiers

– Punishes inaccurate classifiers

• Discovery– Uses the genetic algorithm to search for plausible rules

The Learning Classifier System

• Rule-based knowledge discovery and concept learning tool

• EpiCS– First Learning Classifier System designed

for use in epidemiologic surveillance– Supervised learning environment

Knowledge Representation

• Classifiers– IF-THEN rules

• Condition=“genotype”• Action=“phenotype”

– Strength metric– Encoded as bit strings or numerics

• Population– Fixed size collection of classifiers

Low-level knowledge representation:The Classifier

0111*00011*111:0 34.9

Action BitTaxon

Strength

• Taxon is analogous to a condition (LHS) of an IF-THEN rule

• Action bit is analogous to an action (RHS) of an IF-THEN rule

• Strength is an internal fitness function

High-level knowledge representation:Macrostate Population

Components of a learning classifier system

• Performance– Provides interaction between environment and classifier

population– Performs matching function

• Reinforcement– Rewards accurate classifiers– Punishes inaccurate classifiers

• Discovery– Uses the genetic algorithm to search for plausible knowledge

structures

Generic Machine Learning Model

A Generic Learning Classifier System

Performance component

Classifier population

Reinforcement component

Discovery component

Input Output

EpiCS: A Learning Classifier System

Environment Detectors

Population

[P]01001:110010:01*010:1 ---**110:11*001:0

10010

Match Set

[M]

10**0:11**1*:11001*:1***10:0100*0:0

Effector

Performance Component

Correct Set

[C]

10**0:11**1*:11001*:1

Reinforcement/Penalty

Regime

Reinforcement Component

CoveringGenetic

Algorithm

Discovery Component

Decision (=1)1001*:1

0.5600.3340.871

Not-correct Set

Not[C]

***10:0100*0:0

0.5600.334

EpiCS: Performance Component


Population

[P]01001:110010:01*010:1 ---**110:11*001:0

10010

Match Set

[M]

10**0:11**1*:11001*:1***10:0100*0:0

Effector


Correct Set

[C]

10**0:11**1*:11001*:1


Regime


CoveringGenetic

Algorithm

Discovery Component


0.5600.3340.871

Not-correct Set

Not[C]

***10:0100*0:0

0.5600.334

Performance component

• Creates a subset (the matchset, [M]) of all classifiers in population [P] whose conditions match a string received from the environment

• From [M], a single classifier is selected, based on its strength as a proportion of the sum of all strengths in [M]

• The action of this classifier is then used as the output of the system

EpiCS: Reinforcement Component


Population

[P]01001:110010:01*010:1 ---**110:11*001:0

10010

Match Set

[M]

10**0:11**1*:11001*:1***10:0100*0:0

Effector


Correct Set

[C]

10**0:11**1*:11001*:1


Regime


CoveringGenetic

Algorithm

Discovery Component


0.5600.3340.871

Not-correct Set

Not[C]

***10:0100*0:0

0.5600.334

Reinforcement component• Correct set [C] is created from classifiers in [M]

advocating correct decisions• Remaining classifiers in [M] form Not[C]• Tax is deducted from the strengths of all classifiers

in [C]• Reward is added to the strengths of all classifiers in

[C], biased for generality• Penalty is deducted from the strengths of all

classifiers in Not[C]

EpiCS: Discovery Component


Population

[P]01001:110010:01*010:1 ---**110:11*001:0

10010

Match Set

[M]

10**0:11**1*:11001*:1***10:0100*0:0

Effector


Correct Set

[C]

10**0:11**1*:11001*:1


Regime


CoveringGenetic

Algorithm

Discovery Component


0.5600.3340.871

Not-correct Set

Not[C]

***10:0100*0:0

0.5600.334

Discovery component

• Genetic algorithm invoked once per iteration

• One new offspring is created, from parents deterministically selected based on strength

• The single offspring replaces weakest classifier in the population

Features of EpiCS• Object-oriented implementation

• Stimulus-response architecture

• Payoff/Penalty reinforcement regime

• Syntactic control of overgeneralization

• Differential penalty control of undergeneralization

• Ability to compute risk of outcome

Discovering risk with EpiCS

• Output decision of the learning classifier system is probability of disease (CSPD), rather than dichotomous decision

• CSPD determined from proportion of classifiers matching a given input case’s taxon

Discovering risk with EpiCS: The specifics

CSPD (probabilities of classifiers associated with disease)

(probabilities of all classifiers with matching taxa)

0 95

1 00 95

.

..

Discovery of Predictive Models in an Injury Surveillance Database:

An Application of Data Mining in Clinical Research

Partners for Child Passenger SafetyInformation Infrastructure

State Farm Insurance Companies

CHOPUniversity of PA

Dynamic Science, Inc.

Response Analysis

Corporation

Why data mining is needed for PCPS

• Large number of raw and derived variables renders traditional “manual” methods for discovering patters in data unwieldy

• Hypothesis-driven (biased) analyses may lead to missed associations

• Constantly changing patterns in prospective data require constantly changing analytic approaches that can be informed by data mining

Candidate Predictors

• Demographics

• Kinematics

• Characteristics of crash

• Restraint use

Outcome: Head Injury

• Major burns involving the head

• Skull fracture

• Evidence of brain injury reported by respondent– Excessive sleepiness– Difficulty in arousing– Unresponsiveness– Amnesia after accident

Data Preparation

• Pool of 8,334 records

• 20 separate datasets created– All cases of head injury included (N=415)– Equal number of non-head injury cases

randomly drawn from pool

• Each dataset randomly sampled to create mutually exclusive training and testing sets of equal size

Comparison methods:Logistic Regression

• Variables from training sets stepped into model to determine significant terms

• Significant terms used to create new risk model:

)...( 111

1ˆnnxxe

yP

• Risk model applied to cases in testing set• Risk estimates categorized by deciles and used construct ROC

curves

Comparison Methods:Decision Tree Induction

• C4.5 used to create decision trees from training sets

• 10-fold cross-validation used to optimize trees

• Optimized trees used by C4.5RULES to classify cases in testing set

Experimental Procedure

for x=1 to number of testing cases

evaluate testing case x

Genetic algorithm inactive

Training

Phase

Interim

Evaluation

Phase

Training

Epoch

Testing

Epoch

Trial

for a=1 to maximum number of training epochsfor x=1 to 100

present randomly selected training

case

for x=1 to number of training

cases

evaluate training case x

Results: Training

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Iterations

AUC

Indeterminant Rate

Results: Training

• EpiCS– 5,000 unique classifiers reduced to 2,314

by the end of training

• Logistic regression– Single model with eight significant terms,

no significant interactions

• C4.5– 11 rules created for each training set, most

with single conjuncts

Results: Prediction

Area under the ROC curve obtained on testing, averaged over the 20 separate studies

And now for something a little different

The XCS model

XCS: A little history

• Wilson, SW: Evolutionary Computation, 2(1), 1-18 (1994)– ZCS

• Wilson, SW: Evolutionary Computation, 3(2), 149-175 (1995)– The seminal work on XCS

• Many papers by Lanzi, Barry, Butz, and others• Butz, M and Wilson, SW: Advances in Learning

Classifier Systems. Third International Workshop (IWLCS-2000), Lecture Notes in Artificial Intelligence (LNAI-1996). Berlin: Springer-Verlag (2001)– The algorithm paper

What is XCS?

• An LCS that differs from traditional Holland model– Classifier fitness is based on the accuracy of

the classifiers payoff prediction, rather than the prediction itself

– The genetic algorithm is restricted to niches in the action set, rather than applied to the classifier population as a whole

• The major feature is graceful, accurate generalization

XCS in a nutshell

Source: Wilson, XCS tutorial

((43*99)+(27*3))/102

Action: 00

Action: 01

EpiXCS: An XCS-Based Learning

Classifier System for Epidemiologic Research

Outline

• What is it?• EpiXCS architecture

– Data encoding– Evaluation metrics– Reinforcement– Missing values handling– Classifier ranking– Risk assessment

• Test case: Pima Indians Diabetes Data

What is EpiXCS?

• Learning classifier system based on the XCS paradigm – Uses the Lanzi C++ kernel

• Designed for use in epidemiologic research, specifically mining disease surveillance databases in supervised learning environments– Visualization by non-LCS users– Sensitive to demands of clinical data

Data Encoding in EpiXCS

• All numeric data formats permissible– Binary– Categorical– Ordinal– Real

• Non-binary data represented using “center-spread” approach– Two genes per feature

• Actions are limited to binary (for now)

Sample input data format(Pima Indians Diabetes Database)

ATTRIBUTE 0 <WILD "99"><REAL><STRING "Clump Thickness">ATTRIBUTE 1 <WILD "99"><REAL><STRING "Uniformity of Cell Size">ATTRIBUTE 2 <WILD "99"><REAL><STRING "Uniformity of Cell Shape">ATTRIBUTE 3 <WILD "99"><REAL><STRING "Marginal Adhesion">ATTRIBUTE 4 <WILD "99"><REAL><STRING "Single Epithelial Cell Size">ATTRIBUTE 5 <WILD "99"><REAL><STRING "Bare Nuclei">ATTRIBUTE 6 <WILD "99"><REAL><STRING "Bland Chromatin">ATTRIBUTE 7 <WILD "99"><REAL><STRING "Normal Nucleoli">ATTRIBUTE 8 <WILD "99"><REAL><STRING "Mitoses">ACTION 9 <STRING "Malignant">5 4 4 5 7 10 3 2 1 03 1 1 1 2 2 3 1 1 08 10 10 8 7 10 9 7 1 1…

Classifier Population Initialization

• Minima and maxima for each attribute determined automatically at start of run

• Center values can be initialized by user – Mean– Median– Random value between spread

• Spread values can be initialized by user– Standard deviation– Quantile

Sample Macroclassifiers

/5.5,5.5/107.5,51.5/64.0,21.0/#/316.0,160.0/16.55,16.55/#/#/:1/3.0,3.0/#/91.5,30.5/#/#/#/#/26.0,5.0/:1/2.5,2.5/119.5,63.5/#/#/#/#/#/#/:1/2.5,2.5/107.5,51.5/64.0,21.0/#/#/#/#/49.0,28.0/:1 /2.5,2.5/#/66.0,22.0/#/317.5,301.5/#/1.0040,0.9260/49.0,28.0/:1/2.5,2.5/107.5,51.5/64.0,21.0/#/#/#/1.0735,0.9955/#/:1

Evaluation Metrics

• Sensitivity

• Specificity

• Area under the ROC curve

• Predictive values

• Accuracy

• Learning rate

A Fast Primer on Test Evaluation

Sensitivity

• Prior probability of a test-positive

• If it’s high, then one would want to use the test to diagnose (classify positive)

• If a classifier’s Se is high, then that classifier should be more likely to be used in defining an Correct Set when a training case is known positive

Specificity

• Prior probability of a test-negative

• If it’s high, then one would want to use the test to rule out (classify negative)

• If a classifier’s Sp is high, then that classifier should be more likely to be used in defining an Correct Set when a training case is known negative

The Predictive Values

• Posterior probability of a test-positive or negative• If a PPV is high, then once one has the test

result in hand, and it predicts positive, it would be considered to be accurate

• If a NPV is high, then once one has the test result in hand, and it predicts positive, it would be considered to be accurate

How these metrics are used in EpiXCS

• To evaluate classification performance– Training

• Se, Sp, AUC, Accuracy, and Indeterminate Rate are plotted every 100th iteration

– Testing• Se, Sp, AUC, Accuracy, and Indeterminate Rate

are obtained for the testing set

How these metrics are used in EpiXCS

• To evaluate learning

– Shoulder is the iteration at which 95% of the maximum AUC obtained during training is first attained, and AUCShoulder is the AUC obtained at the shoulder and classification performance

1000Shoulder

AUC

Shoulder 1000Shoulder

AUC

Shoulder 1000Shoulder

AUC

Shoulder

1000Shoulder

AUC

Shoulder

1000Shoulder

AUC

Shoulder

Reinforcement in EpiXCS

• Done the usual way, but…• User can bias the reward depending on

the class distribution– Give disproportionately less “negative” reward

to False Negative classifiers in data with <50% positives (where the Se is low)

– Give disproportionately less “negative” reward to False Positive classifiers in data with <50% negatives (where the Sp is low)

Missing Values Handling during Covering

• Four possible ways to cover missing data in a non-matching input σ that needs to be covered– Wild-to-wild:

• Missing attributes covered as #s

– Random within range• Random value within the range for the attribute

– Population average• Population average for the attribute

– Population standard deviation• Random value within the standard deviation for the attribute

Classifier Ranking

• After training, classifiers ranked according to their predictive values– Classifiers predicting positive ranked by PPV– Classifiers predicting negative ranked by NPV

• Classifier ranking used for rule visualization

Risk Assessment

• Based on risk assessment module used in EpiCS

• Risk estimates determined on testing based on proportional prevalence in match sets for each testing case

• Provides risk assessment analogous that obtained by logistic regression

Test case: Pima Indians Diabetes Data

• 768 cases– 268 positive, 500 negative

• 8 attributes– Gravidity– Plasma glucose– Diastolic blood pressure– Skin-fold thickness– Serum insulin– Body mass index– Pedigree function– Class: Diabetes Yes/No

Experimental procedure

• Training and testing sets created– 134 positives/250 negatives in each

• EpiXCS– Results averaged over 20 runs, 50,000 iterations

each • See5

– Boosting at 10 trials – 10-fold crossvalidation

• Logistic regression– Relaxed stepwise model built on training set and

evaluated on testing set

Rules: EpiXCSIf Number of times pregnant is 7.0 ± 7.0 and plasma glucose concentration after 2 hours is 67.5 ± 11.5 and triceps skinfold thickness is 33.0 ± 26.0 and 2-hour serum insulin is 326.5 ± 66.5 and age is 48.0 ± 27.0Then not diabetes

If Number of times pregnant is 7.5 ± 7.5 and triceps skinfold thickness is 35.5 ± 24.5 and 2-hour serum insulin is 811.5 ± 34.5 and body mass index is 48.9 ± 6.1 and pedigree function is 0.97 ± 0.89Then diabetes

Rules: See5Rule 9/1: (20.5, lift 1.8) pedigree <= 0.179 age <= 34 -> class 0 [0.955]

Rule 9/2: (62.1/2.6, lift 1.8) plasmaglu <= 103 pedigree <= 0.787 -> class 0 [0.944]

Rule 9/3: (9.3, lift 1.7) serumins <= 156 bmi <= 35.3 age > 34 age <= 37 -> class 0 [0.912]

Rule 9/9: (12, lift 2.0) plasmaglu > 135 serumins <= 185 bmi > 33.7 pedigree <= 1.096 age > 37 -> class 1 [0.928]

Rule 9/10: (37.1/2.5, lift 2.0) plasmaglu > 103 bmi > 35.3 pedigree <= 1.096 age > 34 -> class 1 [0.909]

The logistic model

Risk of diabetes= 1.34+ 0.19*Gravidity+ 0.04*Post-prandial glucose+ -0.01*Diastolic blood pressure+ 0.01*Skinfold thickness+ -0.01*Serum insulin+ 0.05*Body mass index+ 0.72*Pedigree function

Classification accuracy on testing

Conclusions

• EpiXCS incorporates features of EpiCS into the XCS paradigm

• Facilitates analysis of epidemiologic data• Uses metrics understood by clinical

researchers• Discovers knowledge comparably to

See5 and logistic regression

Documents

Evolutionary Computation Genetic Algorithms Genetic Programming Learning Classifier Systems