An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali...

An Efficient Greedy Method for Unsupervised Feature Selection

Ahmed Farahat

Joint work with

Ali Ghodsi, and

Mohamed Kamel{afarahat, aghodsib, mkamel}@uwaterloo.ca

ICDM 2011

Outline

• Introduction– Dimension Reduction & Feature Selection– Previous Work

• Proposed Work– Feature Selection Criterion– Recursive Formula– Greedy Feature Selection

• Experiments and Results• Conclusion

Dimension Reduction

• In data mining applications, data instances are typically described by a huge number of features.– Images (>2 megapixels)– Documents (>10K words)

• Most of these features are irrelevant or redundant.

• Goal: Reduce the dimensionality of the data:– allow a better understanding of data– improve the performance of other learning tasks

Feature Selection vs. Extraction

• Feature Selection (a.k.a variable selection)

searches for a relevant subset of existing features

(−) a combinatorial optimization problem

(+) features are easy to interpret

• Feature Extraction (a.k.a feature transformation)

learns a new set of features

(+) unique solutions in polynomial time

(−) features are difficult to interpret

Feature Selection

• Wrapper vs. filter methods: – Wrapper methods search for features which enhance the

performance of the learning task (+) more accurate, (−) more complex

– Filter methods analyze the intrinsic properties of the data, and select highly-ranked features according to some criterion.

(+) less complex, (−) less accurate

• Supervised vs. unsupervised methods

• This work: filter and unsupervised methods

Previous Work

• PCA-basedcalculate PCA, associate features with principal components based on their coefficients, select features associated with the first principal components (Jolliffe, 2002)

• Sparse PCA-based calculate sparse PCA (Zou et al. 2006), select for each principal component the subset of features with non-zero coefficients

• Convex Principal Feature Selection (CPFS) (Masaeli et al SDM’10)

formulates a continuous optimization problem which minimizes the reconstruction error of the data matrix with sparsity constraints

Previous Work (Cont.)

• Feature Selection using Feature Similarity (FSFS) (Mitra et al. TPAMI’02)

groups features into clusters and then selects a representative feature for each cluster

• Laplacian Score (LS) (He et al. NIPS’06)

selects features that preserve similarities between data instances

• Multi-Cluster Feature Selection (MCFS) (Cai et al. KDD’10)

selects features that preserve the multi-cluster structure of the data

This Work

• A criterion for unsupervised feature selection– minimizes the reconstruction error of the data matrix based on

the selected subset of features

• A recursive formula for calculating the criterion

• An effective greedy algorithm for unsupervised feature selection

Feature Select Criterion9

Data matrix

mLeast squares

Minimize lossfeatures

instances

Reconstructed matrix

Problem 1: (Unsupervised Feature Selection) Find a subset of features such that

This is an NP-hard combinatorial optimization problem.

Feature Select Criterion (Cont.)

Theorem 1: Given a set of features . For any ,

Recursive Selection Criterion11

Lemma 1: Given a set of features . For any ,

Recursive Selection Criterion (Cont.)

Proof of Lemma 113

Proof of Lemma 1 (Cont.)

• Let be the Schur complement of in .

• Use block-wise inversion formula of :

Recursive Selection Criterion (Cont.)

• Corollary 1: Given a set of features . For any ,

• Proof:

– Using Lemma 1,

Theorem 1: Given a set of features . For any ,

Recursive Selection Criterion16

Proof of Theorem 117

Greedy Selection Criterion

• Problem 2: (Greedy Feature Selection) At iteration t, find feature l such that,

• Using Theorem 1:

• Problem 2 is equivalent to:

Greedy Selection Criterion (Cont.)

• At iteration t:

• Problems:– Memory inefficient: – Computationally complex: per iteration

Greedy Selection Criterion (Cont.)

• At iteration t, define:

• Calculate E and G recursively as:

• Define ,

Update formulas for f and g

Memory-Efficient Selection22

Partition-based Selection

• Greedy selection criterion: + per iteration

• At each iteration, n candidate features x n projections• Solution:

– Partition features into c << n random groups– Select the feature which best represents the centroids of these

groups – Similar update formulas can be developed for f and g– Complexity: + per iteration

Experiments

Seven methods were compared• PCA-LRG: is a PCA-based method that selects features associated

with the first k principal components (Masaeli et al 2010)

• FSFS: is the Feature Selection using Feature Similarity (Mitra et al. 2006)

• LS: is the Laplacian Score (LS) method (He et al. 2006)

• SPEC: is the spectral feature selection method (Zhao et al. 2007)

• MCFS: is the Multi-Cluster Feature Selection method (Cai et al. 2010)

• GreedyFS: is the basic greedy algorithm (using recursive update formulas for f and g but without random partitioning)

• PartGreedyFS: is the partition-based greedy algorithm

Data Sets26

• These data sets were recently used by Cai et al. (2010) to evaluate different feature selection methods in comparison to the Multi-Cluster Feature Selection (MCFS) method.

Results – k-means27

Results – Affinity Propagation28

Results – Run Times29

Results – Run Times30

Conclusion

• This work presents a novel greedy algorithm for unsupervised feature selection.– a feature selection criterion which measures the reconstruction

error of the data matrix based on the subset of selected features– a recursive formula for calculating the feature selection criterion– an efficient greedy algorithm for feature selection, and two

memory and time efficient variants

• It has been empirically shown that the proposed algorithm – achieves better clustering performance– is less computationally demanding than methods that give

comparable clustering performance

Thank you!

References

• I. Jolliffe, Principal Component Analysis, 2nd ed. Springer, 2002• H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component analysis,” J.

Comput. Graph. Stat., 2006• M. Masaeli, Y. Yan, Y. Cui, G. Fung, and J. Dy, “Convex principal feature

selection,” SIAM SDM 2010• X. He, D. Cai, and P. Niyogi, “Laplacian score for feature selection,” NIPS 2006• Y. Cui and J. Dy, “Orthogonal principal feature selection,” in the Sparse

Optimization and Variable Selection Workshop, ICML 2008• Z. Zhao and H. Liu, “Spectral feature selection for supervised and unsupervised

learning,” ICML 2007• D. Cai, C. Zhang, and X. He, “Unsupervised feature selection for multi-cluster

data,” KDD 2010• P. Mitra, C. Murthy, and S. Pal, “Unsupervised feature selection using feature

similarity,” IEEE Trans. Pattern Anal. Mach. Intell., 2002.

An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali...

Documents

THE DEVELOPMENT CORRIDOR IN THE WESTERN DESERT Opportunities and Challenges An INP Perspective By Ahmad Farahat Development Corridor Workshop Heliopolis

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica

2004-02-02A. Ghodsi aligh@imit.kth.se1 Common Object Request Broker Architecture Ali Ghodsi aligh@imit.kth.se

IoT Meets the Cloud Ali Ghodsi UC Berkeley & KTH & SICS alig@cs.berkeley.edu

2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo

Mohamed Farahat Ibrahim Desouky - fac.ksu.edu.sa

Farahat Thermal 2015.pdf - University of Pretoria

PACMan:CoordinatedMemoryCachingforParallelJobs · PACMan:CoordinatedMemoryCachingforParallelJobs Ganesh Ananthanarayanan ×, Ali Ghodsi ×,®,Andrew Wang ×,Dhruba Borthakur ö, Srikanth

Mohamed Farahat Ibrahim, MD, PhDfac.ksu.edu.sa/sites/default/files/communication_and_swallowing_disorders.pdf · Mohamed Farahat Ibrahim, MD, PhD Associate Professor, Consultant Phoniatrician

Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica

CAP theorem by Ali Ghodsi

IEEE TRANSACTION ON SIGNAL PROCESSING 1 Kernelized ... · Waterloo, Waterloo, ON, Canada e-mail: mkamel@pami.uwaterloo.ca. predictors than samples, where some constraints have to

Mohamed Farahat Ibrahim, MD, PhD · Mohamed Farahat Ibrahim, MD, PhD. Consultant, Assistant Professor. Phoniatrics (Communication and Swallowing Disorders) Deputy chairman, Communication

EnergiScore - ETCC · mkamel@melrok.com . Title: PowerPoint Presentation Author: Michel Precision Created Date: 8/1/2012 11:46:44 AM

Ahmed Farahat, SIPD, UN-ESCWA 1 Liberalization of air transport services A Regional Perspective Ahmed Farahat Director Sectoral Issues and Policies Division

sina.sharif.ac.irsina.sharif.ac.ir/~ghodsi/papers/fazli-tcs2014.pdf · TheoreticalComputerScience550 (2014) 36–50 Contents lists available at ScienceDirect Theoretical Computer

IPM - Sharifsharif.ir/~ghodsi/farsi-full-resume.pdf · Mohammad Ghodsi, Anil Maheshwari, Mostafa Nouri, Yorg-Rudiger Sack, and Hamid Zarrabi- Zadeh, alpha-Visibility, Computational

Tachyon: memory-speed data sharing Haoyuan (HY) Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, Ion Stoica UC Berkeley

Distributed Algorithms – 2g1513 Lecture 6 – by Ali Ghodsi Leader Election and Anonymous Networks

Dr. nahla farahat immunophenotyping of multiple myeloma