Download pptx - [email protected] Winter 2014 Presentation Outline Feature Selection Categorize and Describe Various Algorithms for Feature Selection A Short View

[email protected] Winter 2014

Presentation Outline Feature Selection Categorize and Describe Various Algorithms for Feature Selection A Short View on the Dimension Reduction My Paper

Dimension (Feature or Variable)

Dimension (Feature or Variable) Two feature of person: weight hight

The curse of dimensionality Observe that the data become more and more sparse in higher dimensions (a) 12 samples that fall inside the unit- sized box (b) 7 samples in box(C) 2 samples in box Dimensionality reduction Effective solution to the problem of curse of dimensionality is: Dimensionality reduction

Dimension Reduction General objectives of dimensionality reduction: I.Improve the quality of data for efficient data-intensive processing tasks II.Reduce the computational cost and avoid data over-fitting

Dimension Reduction Dimensionality reduction approaches include : Feature Selection Feature Extraction

Dimension Reduction Feature Extraction: Create new feature based on transformations or combinations of the original feature set. N: Number of original features M: Number of extracted features M

Focus Feature selection Methods Compatibility with the least number of features Search tree --- > BFS

LVF Las Vegas Filter Feature selection Methods Searches for a minimal subset of features N: Number of feature (attribute) M: number of Samples (examples) Evaluation Criterion: inconsistency t max : predetermined number of iteration

SFS (Sequential Forward Selection) SBS (Sequential Backward Selection) Feature selection Methods Nesting Effect plus-l-take-away-r SFFS (Sequential forward Floating Search) SBFS (Sequential Backward Floating Search)

GA (Genetic Algorithm) Feature selection Methods Crossover Mutation SA (Simulated Annealing) RMHC-PF1 (Random Mutation Hill Climbing- Prototype and Feature selection) find sets of prototypes for nearest neighbor classification is a Monte Carlo method can be converted to a Las Vegas algorithm by running the many times.

Three methods commonly used in feature selection : Filter model --- > not consider interrelationship between the features Wrapper model --- > High Complexity Embedded methods Feature redundancy Failure to select the appropriate number of features Defining the problem as a game Defining the problem as a game

Problem as a One-Player Game Defining the problem as a Markov Decision Process Scan environment by Reinforcement Learning Methods Feature selection Method : to consider the interrelationship between the features Upper Confidence Graph Method

The main algorithms : The main algorithms : Dynamic programming Monte Carlo Method Temporal Difference Learning

The best policy possible in the situation f reward that have already achieved The whole set of features Subset of features each allowed action

Average score collected by this feature The number of times that this feature is selected

Benchmarks Information Gain CHI-squared statistic Feature Asseeement by Sliding Threshold(FAST) WEKA Software

Any Question? May 201346 Thanks for your attention