1
Background Alternative Deflation Methods Deflation Properties Background Alternative Deflation Methods Deflation Properties Principal Components Analysis (PCA) Projection Deflation - - = Method 0 = t t T t x A x t s x A t s > 2200 = , 0 0 = t t x A 0 f t A Goal: Extract r leading eigenvectors of sample ) ( ) ( 1 T t t t T t t t x x I A x x I A - - = - Hotelling’s (HD) X X X t t t t s t t t A covariance matrix, Intuition: Projects data onto orthocomplement of space 1 t t t t t t - x Projection (PD) X 0 A Typical solution: Alternate between two tasks 1. Rank-one variance maximization spanned by t x Projection (PD) X Schur Complement (SCD) 0 1. Rank-one variance maximization Schur Complement Deflation t 1 s.t. max arg = = x x x A x x T T Schur Complement (SCD) Orthog. Hotelling’s (OHD) X X X 2. Hotelling’s matrix deflation Schur Complement Deflation 1 s.t. max arg 1 = = - x x x A x x T t T t Orthog. Hotelling’s (OHD) X X X Orthog. Projection (OPD) T A x x A 2. Hotelling’s matrix deflation Orthog. Projection (OPD) Generalized (GD) T T x x A x x A A - = T t T t t t t t x A x A x x A A A 1 1 1 - - - - = Removes contribution of from Intuition: Conditional variance of data variables given the x Generalized (GD) A T t t t T t t t t x x A x x A A 1 1 - - - = t t T t t t x A x 1 1 - - Removes contribution of from Primary drawback: Non-sparse solutions Intuition: Conditional variance of data variables given the new sparse principal component Experiments t x 1 - t A Primary drawback: Non-sparse solutions new sparse principal component Experiments Sparse PCA Orthogonalized Projection Deflation Set up: Leading deflation-based Sparse PCA algorithms Sparse PCA Goal: Extract r sparse “pseudo-eigenvectors” Orthogonalized Projection Deflation GSLDA (Moghaddam et al., ICML ‘06) ) ( T x Q Q I - Goal: Extract r sparse “pseudo-eigenvectors” High variance directions, few non-zero components • DC-PCA (Sriperumbudur et al., ICML ‘07) Outfitted with each deflation technique ) ( ) ( , ) ( 1 1 1 T t t t T t t t T t T t q q I A q q I A x Q Q I q t t - - = - - = - - - High variance directions, few non-zero components Typical solution: Alternate between two tasks Outfitted with each deflation technique Pit props dataset: 13 variables, 180 observations ) ( ) ( , || ) ( || 1 1 1 t t t t t t t T t q q I A q q I A x Q Q I q t t - - = - = - - - Typical solution: Alternate between two tasks 1. Constrained rank-one variance maximization Intuition: Eliminates additional variance contributed by Pit props dataset: 13 variables, 180 observations DC-PCA Cumulative % variance, Cardinality 4,4,4,4,4,4 1 1 t t t - - t x T T k x x x x A x x = = ) ( Card , 1 s.t. max arg Intuition: Eliminates additional variance contributed by = orthonormal basis for extracted pseudo-eigenvectors DC-PCA Cumulative % variance, Cardinality 4,4,4,4,4,4 t t T t T t k x x x x A x x = = - ) ( Card , 1 s.t. max arg 1 t Q HD PD SCD OHD OPD GD 2. Hotelling’s matrix deflation (borrowed from PCA) = orthonormal basis for extracted pseudo-eigenvectors Successive pseudo-eigenvectors are not orthogonal t HD PD SCD OHD OPD GD PC 1 22.60% 22.60% 22.60% 22.60% 22.60% 22.60% Annihilating full vector can reintroduce old components PC 1 22.60% 22.60% 22.60% 22.60% 22.60% 22.60% PC 2 39.60% 39.60% 38.60% 39.60% 39.60% 39.60% Orthogonalized Hotelling’s Deflation defined similarly PC 2 39.60% 39.60% 38.60% 39.60% 39.60% 39.60% PC 3 46.80% 50.90% 53.40% 46.80% 50.90% 51.00% The Problem Reformulating Sparse PCA PC 3 46.80% 50.90% 53.40% 46.80% 50.90% 51.00% PC 4 56.80% 62.10% 62.30% 52.90% 62.10% 62.20% Reformulating Sparse PCA PC 4 56.80% 62.10% 62.30% 52.90% 62.10% 62.20% PC 5 66.10% 70.20% 73.70% 59.90% 70.20% 71.30% Hotelling’s deflation was designed for eigenvectors, not New goal: Explicitly maximize additional variance criterion PC 5 66.10% 70.20% 73.70% 59.90% 70.20% 71.30% PC 6 73.40% 77.80% 79.30% 63.20% 77.20% 78.90% pseudo-eigenvectors Compare Gene expression dataset: 21 genes, 5759 fly nuclei New goal: Explicitly maximize additional variance criterion Solve sparse generalized eigenvector problem PC 6 73.40% 77.80% 79.30% 63.20% 77.20% 78.90% Compare Hotelling’s deflation for PCA Gene expression dataset: 21 genes, 5759 fly nuclei GSLDA Cumulative % variance, Cardinality 9,7,6,5,3,2,2,2 Solve sparse generalized eigenvector problem T T T x Q Q I A Q Q I x - - ) ( ) ( max Hotelling’s deflation for PCA Annihilates variance of : 0 = T x A x x T T T x x Q Q I A Q Q I x t t t t - - - - - - ) ( ) ( max 1 1 1 1 0 HD PD SCD OHD OPD GD Annihilates variance of : Renders orthogonal to : 0 = t t T t x A x t x A x 0 = x A T T x k x x Q Q I x = - ) ( Card , 1 ) ( s.t. PC 1 21.00% 21.00% 21.00% 21.00% 21.00% 21.00% Renders orthogonal to : Preserves positive semidefiniteness: t A t x 0 = t t x A 0 f A Yields generalized deflation procedure t k x x Q Q I x t t = - - - ) ( Card , 1 ) ( s.t. 1 1 PC 2 38.20% 38.10% 38.10% 38.20% 38.10% 38.20% PC 3 52.10% 51.90% 52.00% 52.00% 51.90% 52.20% Preserves positive semidefiniteness: Hotelling’s deflation for Sparse PCA 0 f t A Yields generalized deflation procedure Generalized Deflation PC 3 52.10% 51.90% 52.00% 52.00% 51.90% 52.20% PC 4 60.50% 60.60% 60.40% 60.40% 60.40% 61.00% Hotelling’s deflation for Sparse PCA Annihilates variance of t x Generalized Deflation ) ( , T q q I B B x B q - = = PC 4 60.50% 60.60% 60.40% 60.40% 60.40% 61.00% PC 5 65.70% 67.40% 67.10% 65.90% 67.10% 68.20% Annihilates variance of Does not render orthogonal to t A t x t x ) ( ) ( ) ( , 1 1 T T t t t t t t t q q I A q q I A q q I B B x B q - - = - = = - - PC 5 65.70% 67.40% 67.10% 65.90% 67.10% 68.20% PC 6 69.30% 71.00% 70.40% 70.00% 70.00% 72.10% Does not render orthogonal to Does not preserve positive semidefiniteness t A t x ) ( ) ( 1 T t t t T t t t q q I A q q I A - - = - PC 6 69.30% 71.00% 70.40% 70.00% 70.00% 72.10% PC 7 72.50% 74.00% 73.40% 72.80% 73.70% 75.70% Does not preserve positive semidefiniteness Key properties lost in the Sparse PCA setting PC 7 72.50% 74.00% 73.40% 72.80% 73.70% 75.70% PC 8 75.10% 76.80% 77.00% 75.90% 76.60% 79.60% Key properties lost in the Sparse PCA setting Goal: Recover lost properties with new deflation methods PC 8 75.10% 76.80% 77.00% 75.90% 76.60% 79.60%

A x A x x A √ √ √ √ A A x x A A 1t t ...web.stanford.edu/~lmackey/papers/deflation-nips08-poster.pdf · Sparse PCA Orthogonalized Projection Deflation Set up: Leading deflation

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A x A x x A √ √ √ √ A A x x A A 1t t ...web.stanford.edu/~lmackey/papers/deflation-nips08-poster.pdf · Sparse PCA Orthogonalized Projection Deflation Set up: Leading deflation

Background Alternative Deflation Methods Deflation PropertiesBackground Alternative Deflation Methods Deflation Properties

Principal Components Analysis (PCA) Projection Deflation−−=

Method 0=ttT

t xAx tsxA ts >∀= ,00=tt xA 0ftAPrincipal Components Analysis (PCA)• Goal: Extract r leading eigenvectors of sample

Projection Deflation)()( 1

Tttt

Tttt xxIAxxIA −−= −

MethodHotelling’s (HD) √ X X X

0=ttt xAx tstt t

A• Goal: Extract r leading eigenvectors of sample covariance matrix, Intuition: Projects data onto orthocomplement of space

)()( 1 tttttt xxIAxxIA −−= −

x

Hotelling’s (HD) √ X X XProjection (PD) √ √ √ X0A

• Typical solution: Alternate between two tasks1. Rank-one variance maximization

Intuition: Projects data onto orthocomplement of space spanned by tx

Projection (PD) √ √ √ XSchur Complement (SCD) √ √ √ √

0

1. Rank-one variance maximizationSchur Complement Deflation

tx

1 s.t. maxarg == xxxAxx TT

Schur Complement (SCD) √ √ √ √

Orthog. Hotelling’s (OHD) √ X X X

2. Hotelling’s matrix deflationSchur Complement Deflation1 s.t. maxarg 1 == − xxxAxx T

tT

t

Orthog. Hotelling’s (OHD) √ X X XOrthog. Projection (OPD) √ √ √ √T AxxA2. Hotelling’s matrix deflation Orthog. Projection (OPD) √ √ √ √

Generalized (GD) √ √ √ √TT xxAxxAA −= Tt

Tttt

tt xAx

AxxAAA 11

1−−

− −=• Removes contribution of from Intuition: Conditional variance of data variables given the x

Generalized (GD) √ √ √ √

ATttt

Ttttt xxAxxAA 11 −− −=

ttTt

tt xAxAA

11

−− −=

• Removes contribution of from • Primary drawback: Non-sparse solutions

Intuition: Conditional variance of data variables given the new sparse principal component

Experimentstx 1−tA ttt 1−

• Primary drawback: Non-sparse solutions new sparse principal componentExperiments

Sparse PCA Orthogonalized Projection DeflationSet up: Leading deflation-based Sparse PCA algorithmsSparse PCA

• Goal: Extract r sparse “pseudo-eigenvectors” Orthogonalized Projection Deflation

Set up: Leading deflation-based Sparse PCA algorithms• GSLDA (Moghaddam et al., ICML ‘06)

)( T xQQI −• Goal: Extract r sparse “pseudo-eigenvectors” • High variance directions, few non-zero components

• GSLDA (Moghaddam et al., ICML ‘06)• DC-PCA (Sriperumbudur et al., ICML ‘07)• Outfitted with each deflation technique

)()(,)(

111 T

tttTtttT

tT

t qqIAqqIAxQQI

q tt −−=−−

= −−−• High variance directions, few non-zero components

• Typical solution: Alternate between two tasks • Outfitted with each deflation techniquePit props dataset: 13 variables, 180 observations

)()(,||)(|| 1

11

ttttttt

Tt qqIAqqIAxQQI

qtt

−−=−

= −−−• Typical solution: Alternate between two tasks

1. Constrained rank-one variance maximization Intuition : Eliminates additional variance contributed by Pit props dataset: 13 variables, 180 observationsDC-PCA Cumulative % variance, Cardinality 4,4,4,4,4,4

||)(||11 txQQI

tt−

−−

txTT kxxxxAxx ≤== )(Card ,1 s.t. maxarg

1. Constrained rank-one variance maximization Intuition : Eliminates additional variance contributed by • = orthonormal basis for extracted pseudo-eigenvectors

DC-PCA Cumulative % variance, Cardinality 4,4,4,4,4,4txt

Tt

Tt kxxxxAxx ≤== − )(Card ,1 s.t. maxarg 1 tQ

HD PD SCD OHD OPD GD2. Hotelling’s matrix deflation (borrowed from PCA)• = orthonormal basis for extracted pseudo-eigenvectors• Successive pseudo-eigenvectors are not orthogonal

ttt −1 tQHD PD SCD OHD OPD GD

PC 1 22.60% 22.60% 22.60% 22.60% 22.60% 22.60%• Successive pseudo-eigenvectors are not orthogonal• Annihilating full vector can reintroduce old components

PC 1 22.60% 22.60% 22.60% 22.60% 22.60% 22.60%PC 2 39.60% 39.60% 38.60% 39.60% 39.60% 39.60%

• Annihilating full vector can reintroduce old components• Orthogonalized Hotelling’s Deflation defined similarly

PC 2 39.60% 39.60% 38.60% 39.60% 39.60% 39.60%PC 3 46.80% 50.90% 53.40% 46.80% 50.90% 51.00%The Problem • Orthogonalized Hotelling’s Deflation defined similarly

Reformulating Sparse PCA

PC 3 46.80% 50.90% 53.40% 46.80% 50.90% 51.00%PC 4 56.80% 62.10% 62.30% 52.90% 62.10% 62.20%

The Problem

Reformulating Sparse PCA PC 4 56.80% 62.10% 62.30% 52.90% 62.10% 62.20%PC 5 66.10% 70.20% 73.70% 59.90% 70.20% 71.30%Hotelling’s deflation was designed for eigenvectors, not

New goal: Explicitly maximize additional variance criterion

PC 5 66.10% 70.20% 73.70% 59.90% 70.20% 71.30%PC 6 73.40% 77.80% 79.30% 63.20% 77.20% 78.90%pseudo-eigenvectors

Compare Gene expression dataset: 21 genes, 5759 fly nuclei New goal: Explicitly maximize additional variance criterion• Solve sparse generalized eigenvector problem

PC 6 73.40% 77.80% 79.30% 63.20% 77.20% 78.90%Compare

• Hotelling’s deflation for PCAGene expression dataset: 21 genes, 5759 fly nuclei GSLDA Cumulative % variance, Cardinality 9,7,6,5,3,2,2,2• Solve sparse generalized eigenvector problem

TTT xQQIAQQIx −− )()(max• Hotelling’s deflation for PCA

• Annihilates variance of : 0=T xAxxGSLDA Cumulative % variance, Cardinality 9,7,6,5,3,2,2,2TTT

xxQQIAQQIx

tttt−−

−−−−)()(max

1111 0 HD PD SCD OHD OPD GD• Annihilates variance of : • Renders orthogonal to :

0=ttTt xAxtx

A x 0=xATT

x

kxxQQIx ≤=− )(Card ,1)( s.t.PC 1 21.00% 21.00% 21.00% 21.00% 21.00% 21.00%• Renders orthogonal to :

• Preserves positive semidefiniteness:tA tx 0=tt xA

0fA• Yields generalized deflation procedure

tkxxQQIxtt

≤=−−−

)(Card ,1)( s.t.11 PC 2 38.20% 38.10% 38.10% 38.20% 38.10% 38.20%

PC 3 52.10% 51.90% 52.00% 52.00% 51.90% 52.20%

• Preserves positive semidefiniteness:• Hotelling’s deflation for Sparse PCA

0ftA• Yields generalized deflation procedure

Generalized DeflationPC 3 52.10% 51.90% 52.00% 52.00% 51.90% 52.20%PC 4 60.50% 60.60% 60.40% 60.40% 60.40% 61.00%

• Hotelling’s deflation for Sparse PCA• Annihilates variance of tx Generalized Deflation

)( , TqqIBBxBq −==PC 4 60.50% 60.60% 60.40% 60.40% 60.40% 61.00%PC 5 65.70% 67.40% 67.10% 65.90% 67.10% 68.20%

• Annihilates variance of • Does not render orthogonal to tA tx

tx

)()(

)( , 11

TT

Tttttttt

qqIAqqIA

qqIBBxBq

−−=

−== −− PC 5 65.70% 67.40% 67.10% 65.90% 67.10% 68.20%PC 6 69.30% 71.00% 70.40% 70.00% 70.00% 72.10%

• Does not render orthogonal to • Does not preserve positive semidefiniteness

tA tx

)()( 1Tttt

Tttt qqIAqqIA −−= −

PC 6 69.30% 71.00% 70.40% 70.00% 70.00% 72.10%PC 7 72.50% 74.00% 73.40% 72.80% 73.70% 75.70%

• Does not preserve positive semidefinitenessKey properties lost in the Sparse PCA setting 1 tttttt − PC 7 72.50% 74.00% 73.40% 72.80% 73.70% 75.70%

PC 8 75.10% 76.80% 77.00% 75.90% 76.60% 79.60%

Key properties lost in the Sparse PCA settingGoal: Recover lost properties with new deflation methods PC 8 75.10% 76.80% 77.00% 75.90% 76.60% 79.60%Goal: Recover lost properties with new deflation methods