8
Examples of Dimensionality Reduction CIS 660 Data Mining Sunnie Chung Problem: Curse of Dimensionality High Dimensionality is a Problem for Machine Learning Algorithms to Classify Data by the Observed Class Variables (Labels)

Examples of Dimensionality Reduction

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Examples of Dimensionality Reduction

Examples of Dimensionality Reduction

CIS 660 Data Mining

Sunnie Chung

Problem: Curse of Dimensionality

High Dimensionality is a Problem for Machine Learning Algorithms to Classify Data by the

Observed Class Variables (Labels)

Page 2: Examples of Dimensionality Reduction

How to Deal with High Dimensionality

• Dimensionality Reduction WITHOUT Loss of Information

Page 3: Examples of Dimensionality Reduction

• Feature Selection

• Feature Extraction

Page 4: Examples of Dimensionality Reduction

• Identify Highly Positively Correlated Features to Merge/Remove

• Same Information: Correlation of height and urefu (height in Swahili) ~= 1

• Highly Positively Correlated Features X1, X2, X3, X4, X5 to Temperature

Page 5: Examples of Dimensionality Reduction

• Apply Transformation Algorithms to Reduce/Extract True Dimensions

Only to Reduce Dimensions

Page 6: Examples of Dimensionality Reduction

• Apply Well Known Data Reduction Methods – PCA or SVD

PCA Procedure

Page 7: Examples of Dimensionality Reduction

1. Find the eigenvectors of the covariance matrix

2. The eigenvectors define the new space

Project two red points on blue e that preserves greatest variability (range of variance) instead of green e

on which distance of original two red points get the same.

See the rest of the procedure here.

http://eecs.csuohio.edu/~sschung/CIS660/MahalanobisDistance.pdf

https://plot.ly/ipython-notebooks/principal-component-analysis/

https://www.youtube.com/watch?v=IbE0tbjy6JQ&list=PLBv09BD7ez_5_yapAg86Od6JeeypkS4YM

Page 8: Examples of Dimensionality Reduction

When to Use PCA for Your Data Analytics