Feature Generation and Selection in SRL

Alexandrin Popescul & Lyle H. Ungar

Presented By Stef Schoenmackers

Overview

• Structural Generalized Linear Regression (SGLR) Overview

• Design Motivations

• Experiments

• Conclusions

SGLR Overview

• Adds statistical methods to ILP SQL as the logical language Generalized Linear Regression as statistical method

• Uses clustering to generate new relations

• Builds discriminative models Targeted at large problems where generative models

impossible

• Integrates feature generation and problem modeling

SGLR Loop

SGLR Method

• Clusters data and adds clusters as new relations

• Searches the space of SQL query refinements Features are numerical SQL aggregates Test feature with statistical measure (e.g. AIC, BIC) Add only significantly predictive features Examine each feature only once Use current set of features to guide search

Overview

Structural Generalized Linear Regression (SGLR) Overview

• Design Motivations

• Experiments

• Conclusions

SQL Motivation

• Most of the world’s data is in relational databases Can exploit schema and meta-information

• SQL uses a fairly expressive language Non-recursive first-order logic formulas

• Relational DBs have been studied and optimized for decades, so should be more scalable than other alternatives

Clustering Motivation

• Dimensionality reduction• Clusters are added as relations (new first-

class concepts) Increases expressivity of the language

describing patterns in the data Can lead to a more rapid discovery of

predictive features

• Done as a pre-processing step cost(clustering) << cost(feature search)

Aggregation Motivation

• Summarizes the information in a table into scalar values usable by a statistical model average, max, min, count, average, empty/exists (0/1)

• Exploits database work into making them efficient

• Provides a richer space of features to choose from

Dynamic Feature Generation

• Most features do not provide useful information• In large domains, feature generation is

expensive, and precomputing all possible features is far too time consuming

• Solution: Use a smarter search strategy and dynamically generate features. Let the features already selected influence which features are added

• Focuses only on the promising areas in the search space

Feature Streams

• Put features into different evaluation queues

• Choose next feature from the ‘best’ stream

• If feature in multiple streams, only evaluate once

• Stream design can use prior knowledge/bias

Refinement Graphs (in ILP)

• Start with most general rule, and ‘refines’ it to produce more specific clauses Single variable substitution Add predicate involving 1+ existing variables

• Uses top-down breadth-first search to find the most general rule that covers only positive examples

• Performs poorly in noisy domains

Refinement Graphs (in SGLR)

• Adds one relation to a query and expands it into all possible configurations of equality conditions of new attributes with a new or old attribute Contains at least one equality condition between a

new and old attribute Any attribute can be set to a constant High-level variable typing/classes are enforced

• Not all refinements are most general, but simplifies pruning of equivalent subspaces (accounts only for the type and number of relations joined in a query)

Example Refinement Graph

Query(d)

Cites(d,d1) Author_of(d, a) Word_count(d, w, int)

Author_of(d, a=“Smith”)

Cites(d,d1),Cites(d1,d2)

Cites(d,d1), Author_of(d1, a)

Cites(d,d1), Author_of(d1, a=“Domingos”)

DB Tables

Overview

Structural Generalized Linear Regression (SGLR) Overview

Design Motivations

• Experiments

• Conclusions

Experiments

• Used CiteSeer data Citation(doc1, doc2), Author(doc, person),

PublishedIn(doc, venue), HasWord(doc,word) 60k Docs, 131k Authors, 173k Citations, 6.8M

• Two Tasks Predict the publication venue Predict existence of a citation

Experiments

• Cluster all many-to-many relations K-means Added 6 new relations

• Use logistic regression for prediction

• BFS of search space

• 5k+/5k- examples for venue prediction

• 2.5k+/2.5k- examples for citation prediction

Results

Venue (87.2%) Citation (93.1%)

Dynamic Feature Generation

• Query expressions generated Breadth-First• Baseline puts all queries into one queue• Dynamic strategy enqueues queries into

separate streams Stream 1: exists and count over table Stream 2: other aggregates (counts of unique

elements in individual columns) Chooses next feature from stream where

(featuresAdded+1)/(featuresTried+1) is max Stop when a stream is empty

Results

Citation

Clusters No Clusters

Time Results

Citation

Clusters No Clusters

Domain Independent Learning

• Most citation prediction features are research-area generic

• Can we train a model for one area and test on another?

Domain Independent Results

• Used KDD-Cup 2003 data (High Energy Physics papers in arXiv)

Train On Test On Accuracy

CiteSeer arXiv 92.9%

CiteSeer CiteSeer 92.6%

arXiv arXiv 96.0%

Conclusions

• Cluster-based features add expressivity, and apply to any domain or SRL method

• Generating queries dynamically can reduce search time and increase accuracy

Questions?

Feature Generation and Selection in SRL

Documents

3. Feature Selection

Feature Selection in Hierarchical Feature Spaces · Feature Selection in Hierarchical Feature Spaces 3 Fig.1: An example hierarchy of binary features 3 Related Work Feature selection

Feature Selection for Time Series Modeling · feature selection and feature rejection. The rest of this paper is organized as follows. The feature selection methods are presented

Feature Selection

Sequential Feature Selection

Feature Selection/Extraction

Approaches to Feature Identiﬁcation and Feature Selection

Simultaneous Feature Selection and Feature …sreevani_r/Dissertation_260.pdf1 Simultaneous Feature Selection and Feature Extraction for Pattern Classification A dissertation submitted

Materialization Optimizations for Feature Selection Workloadsi.stanford.edu/hazy/papers/mod539-zhang.pdf · cess is feature selection; in feature selection, an analyst se-lects the

Comparative Feature Selection

Lecture08 - Data - Feature Selection & Extraction€¦ · Dr. Patrick Chan @ SCUT Feature Selection & Extraction Feature Selection Selectinga subset of features without a transformation

LNAI 5212 - Robust Feature Selection Using Ensemble ...bioinformatics.psb.ugent.be/pdf/publications/978-3-540-87481-2.pdf · Robust Feature Selection Using Ensemble Feature Selection

STABILITY OF FEATURE SELECTION ALGORITHMS AND ENSEMBLE ... · stability of feature selection algorithms and ensemble feature selection methods in bioinformatics pengyi yang, bing

Feature Selection for Unsupervised Learningjmlr.csail.mit.edu/papers/volume5/dy04a/dy04a.pdf · 2. Feature Subset Selection and EM Clustering (FSSEM) Feature selection algorithms

Dimensionality reduction: feature extraction & feature selection

Feature Selection - Abandah

Feature Selection for Classiﬁcation: A Review · PDF fileFeature Selection for Classiﬁcation: A Review. 2. ... methods can be converted into feature selection methods ... feature

Explicit Max Margin Input Feature Selection for Nonlinear SVM …tfcolema/articles/fspaper.pdf · EXPLICIT MAX-MARGIN FEATURE SELECTION Explicit Max Margin Input Feature Selection

Feature extraction vs. feature selection Search …research.cs.tamu.edu/prism/lectures/pr/pr_l11.pdf · • Feature extraction vs. feature selection • Search strategy and objective

Feature Selection - Feature Selection for Data · PDF file1 Background Introduction Feature relevance ... Fast Correlation Based Filter (FCBF) Scatter Search ... Feature selection