Presentation Outline Feature Selection Categorize and Describe
Various Algorithms for Feature Selection A Short View on the
Dimension Reduction My Paper
Slide 4
Slide 5
Dimension (Feature or Variable)
Slide 6
Dimension (Feature or Variable) Two feature of person: weight
hight
Slide 7
The curse of dimensionality Observe that the data become more
and more sparse in higher dimensions (a) 12 samples that fall
inside the unit- sized box (b) 7 samples in box(C) 2 samples in box
Dimensionality reduction Effective solution to the problem of curse
of dimensionality is: Dimensionality reduction
Slide 8
Dimension Reduction General objectives of dimensionality
reduction: I.Improve the quality of data for efficient
data-intensive processing tasks II.Reduce the computational cost
and avoid data over-fitting
Dimension Reduction Feature Extraction: Create new feature
based on transformations or combinations of the original feature
set. N: Number of original features M: Number of extracted features
M
Focus Feature selection Methods Compatibility with the least
number of features Search tree --- > BFS
Slide 35
LVF Las Vegas Filter Feature selection Methods Searches for a
minimal subset of features N: Number of feature (attribute) M:
number of Samples (examples) Evaluation Criterion: inconsistency t
max : predetermined number of iteration
GA (Genetic Algorithm) Feature selection Methods Crossover
Mutation SA (Simulated Annealing) RMHC-PF1 (Random Mutation Hill
Climbing- Prototype and Feature selection) find sets of prototypes
for nearest neighbor classification is a Monte Carlo method can be
converted to a Las Vegas algorithm by running the many times.
Slide 38
Slide 39
Three methods commonly used in feature selection : Filter model
--- > not consider interrelationship between the features
Wrapper model --- > High Complexity Embedded methods Feature
redundancy Failure to select the appropriate number of features
Defining the problem as a game Defining the problem as a game
Slide 40
Problem as a One-Player Game Defining the problem as a Markov
Decision Process Scan environment by Reinforcement Learning Methods
Feature selection Method : to consider the interrelationship
between the features Upper Confidence Graph Method
Slide 41
The main algorithms : The main algorithms : Dynamic programming
Monte Carlo Method Temporal Difference Learning
Slide 42
The best policy possible in the situation f reward that have
already achieved The whole set of features Subset of features each
allowed action
Slide 43
Average score collected by this feature The number of times
that this feature is selected
Slide 44
Benchmarks Information Gain CHI-squared statistic Feature
Asseeement by Sliding Threshold(FAST) WEKA Software
Slide 45
Slide 46
Slide 47
Any Question? May 201346 Thanks for your attention