Factor Investing using Penalized Principal Components · 08.02.2018 · 2 Uses information from large panel data sets: Many assets with many time observations 3 Searches for factors

Factor Investing using Penalized PrincipalComponents

Markus Pelger 1 Martin Lettau 2

1Stanford University

2UC Berkeley

February 8th 2018

AI in Fintech Forum 2018

Intro Model Illustration Theoretical Results Empirical Results Conclusion Appendix

Motivation

Motivation: Asset Pricing with Risk Factors

The Challenge of Asset Pricing

Most important question in finance: Why are prices different fordifferent assets?

Fundamental insight: Arbitrage Pricing Theory: Prices of financialassets should be explained by systematic risk factors.

Problem: “Chaos” in asset pricing factors: Over 300 potential assetpricing factors published!

Fundamental question: Which factors are really important inexplaining expected returns? Which are subsumed by others?

Goals of this paper:

Bring order into “factor chaos”

⇒ Summarize the pricing information of a large number of assets witha small number of factors

1


Motivation

Why is it important?

Importance of factors for investing

1 Optimal portfolio construction

Only factors are compensated for systematic riskOptimal portfolio with highest Sharpe-ratio must be based onfactor portfolios(Sharpe-ratio=expected excess return/standard deviation)“Smart beta” investments = exposure to risk factors

2 Arbitrage opportunities

Find underpriced assets and earn “alpha”

3 Risk management

Factors explain risk-return trade-offFactors allow to manage systematic risk exposure

2


Motivation

Contribution of this paper

Contribution

This Paper: Estimation approach for finding risk factors andoptimal factor investing

Key elements of estimator:

1 Statistical factors instead of pre-specified (and potentiallymiss-specified) factors

2 Uses information from large panel data sets: Many assets withmany time observations

3 Searches for factors explaining asset prices (explain differencesin expected returns) not only co-movement in the data

4 Allows time-variation in factor structure

3


Motivation

Contribution of this paper

Results

Asymptotic distribution theory for weak and strong factors

⇒ No “blackbox approach”

Estimator discovers “weak” factors with high Sharpe-ratios

⇒ high Sharpe-ratio factors important for asset pricing andinvestment

Estimator strongly dominates conventional approach (PrincipalComponent Analysis (PCA))

⇒ PCA does not find all high Sharpe-ratio factors

Empirical results:

Investment: 3 times higher Sharpe-ratio then benchmarkfactors (PCA)Asset Pricing: Smaller pricing errors in- and out-of samplethan benchmark (PCA, 5 Fama-French factors, etc.) 4


Model

The Model

Approximate Factor Model

Observe excess returns of N assets over T time periods:

Xt,i = Ft1×K

>︸︷︷︸factors

ΛiK×1︸︷︷︸

loadings

+ et,i︸︷︷︸idiosyncratic

i = 1, ...,N t = 1, ...,T

Matrix notation

X︸︷︷︸T×N

= F︸︷︷︸T×K

Λ>︸︷︷︸K×N

+ e︸︷︷︸T×N

N assets (large)T time-series observation (large)K systematic factors (fixed)

F , Λ and e are unknown5


Model

The Model

Approximate Factor Model

Systematic and non-systematic risk (F and e uncorrelated):

Var(X ) = ΛVar(F )Λ>︸︷︷︸systematic

+ Var(e)︸︷︷︸non−systematic

⇒ Systematic factors should explain a large portion of thevariance

⇒ Idiosyncratic risk can be weakly correlated

Arbitrage-Pricing Theory (APT): The expected excess return isexplained by the risk-premium of the factors:

E [Xi ] = E [F ]Λ>i

⇒ Systematic factors should explain the cross-section of expectedreturns

6


Model

The Model: Estimation of Latent Factors

Conventional approach: PCA (Principal component analysis)

Apply PCA to the sample covariance matrix:

1

TX>X − X X>

with X = sample mean of asset excess returns

Eigenvectors of largest eigenvalues estimate loadings Λ.

Much better approach: Risk-Premium PCA (RP-PCA)

Apply PCA to a covariance matrix with overweighted mean

1

TX>X + γX X> γ = risk-premium weight

Eigenvectors of largest eigenvalues estimate loadings Λ.

F estimator for factors: F = 1NX Λ = X (Λ>Λ)−1Λ>. 7


Model

The Model: Objective Function

Conventional PCA: Objective Function

Minimize the unexplained variance:

minΛ,F

1

NT

N∑i=1

T∑t=1

(Xti − FtΛ>i )2

RP-PCA (Risk-Premium PCA): Objective Function

Minimize jointly the unexplained variance and pricing error

minΛ,F

1

NT

N∑i=1

T∑t=1

(Xti − FtΛ>i )2

︸︷︷︸unexplained variation

+γ1

N

N∑i=1

(Xi − FΛ>i

)2

︸︷︷︸pricing error

with Xi = 1T

∑Tt=1 Xt,i and F = 1

T

∑Tt=1 Ft and risk-premium weight γ

8


Model

The Model: Interpretation

Interpretation of Risk-Premium-PCA (RP-PCA):

1 Combines variation and pricing error criterion functions:

Select factors with small cross-sectional pricing errors (alpha’s).Protects against spurious factor with vanishing loadings as itrequires the time-series errors to be small as well.

2 Penalized PCA: Search for factors explaining the time-series butpenalizes low Sharpe-ratios.

3 Information interpretation:

PCA of a covariance matrix uses only the second moment butignores first momentUsing more information leads to more efficient estimates.RP-PCA combines first and second moments efficiently.

9


Model

The Model: Interpretation

Interpretation of Risk-Premium-PCA (RP-PCA): continued

4 Signal-strengthening: Intuitively the matrix 1T X>X + γX X>

converges to

Λ(ΣF + (1 + γ)µFµ

>F

)Λ> + Var(e)

with ΣF = Var(F ) and µF = E [F ]. The signal of weak factors witha small variance can be “pushed up” by their mean with the right γ.

10


Illustration

Illustration (Size and accrual)

Illustration: Anomaly-sorted portfolios (Size and accrual)

Factors

1 PCA: Estimation based on PCA of correlation matrix, K = 32 RP-PCA: K = 3 and γ = 1003 FF-long/short: market, size and accrual (based on extreme

quantiles, same construction as Fama-French factors)

Data

Double-sorted portfolios according to size and accrual (fromKenneth French’s website)Monthly return data from 07/1963 to 05/2017(T = 647) for N = 25 portfoliosOut-of-sample: Rolling window of 20 years (T=240)

Optimal factor investing: maximum Sharpe-ratio portfolio

Ropt = F · Σ−1F µF 11


Illustration

Illustration (Size and accrual)

In-sample Out-of-sampleSR RMS α Idio. Var. SR RMS α Idio. Var.

RP-PCA 0.33 0.06 2.11 0.23 0.08 2.19PCA 0.13 0.14 1.97 0.11 0.14 2.04

FF-long/short 0.21 0.12 2.63 0.11 0.12 2.23

Table: Maximal Sharpe-ratios, root-mean-squared pricing errors andunexplained idiosyncratic variation. K = 3 factors and γ = 100.

SR: Maximum Sharpe-ratio of linear combination of factors

Cross-sectional pricing errors α:

Pricing error αi = E [Xi ]− E [F ]Λ>i

RMS α: Root-mean-squared pricing errors√

1N

∑Ni=1 αi

2

Idiosyncratic Variation: 1NT

∑Ni=1

∑Tt=1(Xt,i − F>t Λi )

2

⇒ RP-PCA significantly better than PCA and quantile-sorted factors. 12


Illustration

Loadings for statistical factors (Size and Accrual)

0 5 10 15 20 25

Portfolio

-0.5

0

0.5

Loadin

gs

1. PCA factor

0 5 10 15 20 25

Portfolio

-0.5

0

0.5

Loadin

gs

2. PCA factor

0 5 10 15 20 25

Portfolio

-0.5

0

0.5

Loadin

gs

3. PCA factor

0 5 10 15 20 25

Portfolio

-0.5

0

0.5

Loadin

gs

1. RP-PCA factor

0 5 10 15 20 25

Portfolio

-0.5

0

0.5

Loadin

gs

2. RP-PCA factor

0 5 10 15 20 25

Portfolio

-0.5

0

0.5

Loadin

gs

3. RP-PCA factor

⇒ RP-PCA detects accrual factor while 3rd PCA factor is noise.

13


Illustration

Maximal Sharpe ratio (Size and accrual)

SR (In-sample)

1 factor 2 factors 3 factors0

0.05

0.1

0.15

0.2

0.25

0.3

0.35 =-1=0=1=10=50=100

SR (Out-of-sample)

1 factor 2 factors 3 factors0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Figure: Maximal Sharpe-ratio by adding factors incrementally.

⇒ 1st and 2nd PCA and RP-PCA factors the same.

⇒ RP-PCA detects 3rd factor (accrual) for γ > 10.14


Weak vs. Strong Factors

The Model

Strong vs. weak factor models

Strong factor model ( 1N Λ>Λ bounded)

Interpretation: strong factors affect most assets (proportionalto N), e.g. market factorStrong factors lead to exploding eigenvalues

⇒ RP-PCA always more efficient than PCA⇒ optimal γ relatively small

Weak factor model (Λ>Λ bounded)

Interpretation: weak factors affect a smaller fraction of assetsWeak factors lead to large but bounded eigenvalues

⇒ RP-PCA detects weak factors which cannot be detected byPCA

⇒ optimal γ relatively large

In Data: Typically 1-2 strong factors and several weak factors withhigh Sharpe-ratios 15


Weak Factor Model

Weak Factor Model

Weak Factor Model

Weak factors have small variance or affect small fraction of assets

Λ>Λ bounded (after normalizing factor variances)

Statistical model: Spiked covariance from random matrix theory

Eigenvalues of sample covariance matrix separate into two areas:

The bulk, majority of eigenvaluesThe extremes, a few large outliers

Bulk spectrum converges to generalized Marchenko-Pastur distrib.

Large eigenvalues converge either to

A biased value (characterized by the Stieltjes transform)To the bulk of the spectrum if the true eigenvalue is too small

⇒ Phase transition phenomena: estimated eigenvectorsorthogonal to true eigenvectors if eigenvalues too small

16


Weak Factor Model

Weak Factor Model

Intuition: Weak Factor Model

“Signal” matrix for PCA of covariance matrix:

ΛΣFΛ>

K largest eigenvalues θPCA1 , ..., θPCAK measure strength of signal

“Signal” matrix for RP-PCA:


>F

)Λ>

K largest eigenvalues θRP-PCA1 , ..., θRP-PCA

K measure strength of signal

If µF 6= 0 and γ > −1 then RP-PCA signal always larger than PCAsignal:

θRP-PCAi > θPCAi

17


Weak Factor Model

Weak Factor Model

Theorem 1: Risk-Premium PCA under weak factor model

The correlation of the estimated with the true factors is

Corr(F , F )p→ U︸︷︷︸

rotation

ρ1 0 · · · 00 ρ2 · · · 0

0 0. . .

...0 · · · 0 ρK

V︸︷︷︸rotation

with

ρ2i

p→ 1

1+θiB(θi ))if θi > θcrit

0 otherwise

Critical value θcrit and function B(.) depend only on the noise distributionand are known in closed-form

Based on closed-form expression choose optimal RP-weight γ

For θi > θcrit ρ2i is strictly increasing in θi .

⇒ RP-PCA strictly dominates PCA18


Strong Factor Model

Strong Factor Model

Strong Factor Model

Strong factors affect most assets: e.g. market factor

1N Λ>Λ bounded (after normalizing factor variances)

Factors and loadings can be estimated consistently and areasymptotically normal distributed

Assumptions more general than in weak factor model

Asymptotic Efficiency

Choose RP-weight γ to obtain smallest asymptotic variance of estimators

RP-PCA (i.e. γ > −1) always more efficient than PCA

In “most cases” optimal γ = 0 (smaller than in weak factor model)

RP-PCA and PCA are both consistent

19


Time-varying loadings


Model with time-varying loadings

Observe panel of excess returns and L covariates Zi,t−1,l :

Xt,i = Ft1×K

> gK×1

(Zi,t−1,1, ...,Zi,t−1,L) + et,i

Loadings are function of L covariates Zi,t−1,l with l = 1, ..., Le.g. characteristics like size, book-to-market ratio, past returns,...

Factors and loading function are latent

Idea: Approximate non-linear function g by basis functions

⇒ Project return data on appropriate basis functions⇒ Equivalent to apply RP-PCA to portfolio data

Special case: Characteristics sorted portfolios corresponds tokernel basis functionsGeneral case: Obtain arbitrary interactions and break curse ofdimensionality by conditional tree sorting projection 20


Empirical Results

Single-sorted portfolios

Portfolio Data

Monthly return data from 07/1963 to 12/2016 (T = 638) forN = 370 portfolios

Kozak, Nagel and Santosh (2017) data: 370 decile portfolios sortedaccording to 37 anomalies

Factors:

1 RP-PCA: K = 6 and γ = 100.2 PCA: K = 63 Fama-French 5: The five factor model of Fama-French

(market, size, value, investment and operating profitability, allfrom Kenneth French’s website).

4 Proxy factors: RP-PCA and PCA factors approximated with5% of largest position.

21


Empirical Results

Single-sorted portfolios


RP-PCA 0.66 0.15 2.73 0.53 0.11 3.19PCA 0.28 0.15 2.70 0.22 0.14 3.19

Fama-French 5 0.32 0.23 4.97 0.31 0.21 4.62

Table: Deciles of 37 single-sorted portfolios from 07/1963 to 12/2016(N = 370 and T = 638): Maximal Sharpe-ratios, root-mean-squaredpricing errors and unexplained idiosyncratic variation. K = 6 statisticalfactors.

RP-PCA strongly dominates PCA and Fama-French 5 factors

Results hold out-of-sample.

22


Empirical Results

Single-sorted portfolios: Maximal Sharpe-ratio

SR (In-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

0.1

0.2

0.3

0.4

0.5

0.6

0.73 factors4 factors5 factors6 factors7 factors

SR (Out-of-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure: Maximal Sharpe-ratios.

⇒ Spike in Sharpe-ratio for 6 factors

23


Empirical Results

Single-sorted portfolios: Pricing error

RMS (In-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

0.05

0.1

0.15

0.2

0.25

3 factors4 factors5 factors6 factors7 factors

RMS (Out-of-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

0.05

0.1

0.15

0.2

0.25

Figure: Root-mean-squared pricing errors.

⇒ RP-PCA has smaller out-of-sample pricing errors

24


Empirical Results

Single-sorted portfolios: Idiosyncratic Variation

Idiosyncratic Variation (In-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

1

2

3

4

5


Idiosyncratic Variation (Out-of-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

1

2

3

4

5

Figure: Unexplained idiosyncratic variation.

⇒ Unexplained variation similar for RP-PCA and PCA

25


Empirical Results

Single-sorted portfolios: Maximal Sharpe-ratio

SR (In-sample)

1 factor

2 factors

3 factors

4 factors

5 factors

6 factors

7 factors0

0.1

0.2

0.3

0.4

0.5

0.6

0.7 =-1=0=1=10=50=100

SR (Out-of-sample)

1 factor

2 factors

3 factors

4 factors

5 factors

6 factors

7 factors0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure: Maximal Sharpe-ratios for different RP-weights γ and number offactors K

⇒ Strong increase in Sharpe-ratios for γ ≥ 10.26


Empirical Results

Optimal Portfolio with RP-PCA

Composition of Stochast Discount Factor (RP-PCA)

Indus

try M

om. R

ev.

Ind. R

el. Rev

. (L.V.)

Value-M

om-Prof

.

Indus

try Rel.

Revers

als

Value-P

rofita

bility

Seaso

nality

Value (

M)

Gross P

rofita

bility

Momen

tum (1

2m)

Idios

yncra

tic Vola

tility

Long

Run Rev

ersals

Asset T

urnov

er

Momen

tum-R

evers

als

Sales/P

rice

Return

on Asse

ts (A)

Inves

tmen

t/Asse

ts

Value (

A)

Return

on Boo

k Equ

ity (A)

Gross M

argins

Compo

site Is

suan

ce

Earning

s/Pric

eSize

Momen

tum (6

m)

Accrua

l

Cash F

lows/P

rice

Short-T

erm Rev

ersals

Value-M

omen

tum

Inves

tmen

t Grow

th

Indus

try M

omen

tum

Asset G

rowth

Dividen

d/Pric

e

Share

Volume

Net Ope

rating

Assets

Price

Inves

tmen

t/Cap

ital

Leve

rage

Sales G

rowth

-0.3

-0.2

-0.1

0

0.1

0.2

0.3High DecileLow Decile

Figure: Portfolio composition of highest Sharpe ratio portfolio(Stochastic Discount Factor) based on 6 RP-PCA factors.

27


Empirical Results

Optimal Portfolio with RP-PCA (largest positions)

Composition of Stochast Discount Factor (RP-PCA)

Indu

stry M

om. R

ev.

Ind.

Rel.

Rev

. (L.

V.)

Value-

Mom

-Pro

f.

Indu

stry R

el. R

ever

sals

Value-

Profita

bility

Seaso

nality

Value

(M)

Gross

Pro

fitabil

ity

Mom

entu

m (1

2m)

Idios

yncr

atic

Volatili

ty

Long

Run

Rev

ersa

ls

Asset

Tur

nove

r

Mom

entu

m-R

ever

sals

Sales/P

rice

Retur

n on

Ass

ets (

A)

-0.3

-0.2

-0.1

0

0.1

0.2

0.3 High DecileLow Decile

Figure: Largest portfolios in highest Sharpe ratio portfolio (StochasticDiscount Factor) based on 6 RP-PCA factors. 28


Empirical Results

Optimal Portfolio with PCA

Composition of Stochast Discount Factor (PCA)

Value-P

rofita

bility

Asset T

urnov

er

Sales/P

rice

Leve

rage

Value-M

om-Prof

.

Gross P

rofita

bility

Indus

try M

om. R

ev.

Earning

s/Pric

eSize

Idios

yncra

tic Vola

tility

Long

Run Rev

ersals

Ind. R

el. Rev

. (L.V.)

Return

on Boo

k Equ

ity (A)

Dividen

d/Pric

e

Cash F

lows/P

rice

Value-M

omen

tum

Value (

A)

Return

on Asse

ts (A)

Indus

try M

omen

tum

Momen

tum (6

m)

Momen

tum (1

2m)

Compo

site Is

suan

ce

Momen

tum-R

evers

als

Inves

tmen

t/Asse

ts

Share

Volume

Short-T

erm Rev

ersals

Asset G

rowthPric

e

Seaso

nality

Gross M

argins

Inves

tmen

t/Cap

ital

Inves

tmen

t Grow

th

Value (

M)

Indus

try Rel.

Revers

als

Sales G

rowth

Accrua

l

Net Ope

rating

Assets

-0.3

-0.2

-0.1

0

0.1

0.2

0.3High DecileLow Decile

Figure: Portfolio composition of highest Sharpe ratio portfolio(Stochastic Discount Factor) based on 6 PCA factors. 29


Empirical Results

Optimal Portfolio with PCA (largest positions)

Composition of Stochast Discount Factor (PCA)

Value-

Profita

bility

Asset

Tur

nove

r

Sales/P

rice

Leve

rage

Value-

Mom

-Pro

f.

Gross

Pro

fitabil

ity

Indu

stry M

om. R

ev.

Earnin

gs/P

rice

Size

Idios

yncr

atic

Volatili

ty

Long

Run

Rev

ersa

ls

Ind.

Rel.

Rev

. (L.

V.)

Retur

n on

Boo

k Equ

ity (A

)

Divide

nd/P

rice

Cash

Flows/P

rice

-0.3

-0.2

-0.1

0

0.1

0.2

0.3 High DecileLow Decile

Figure: Largest portfolios in highest Sharpe ratio portfolio (StochasticDiscount Factor) based on 6 PCA factors. 30


Empirical Results

Interpreting factors: Generalized correlations with proxies

RP-PCA PCA

1. Gen. Corr. 1.00 1.002. Gen. Corr. 1.00 1.003. Gen. Corr. 0.99 0.994. Gen. Corr. 0.98 0.995. Gen. Corr. 0.92 0.946. Gen. Corr. 0.78 0.89

Table: Generalized correlations of statistical factors with proxy factors(portfolios of 5% of assets).

Problem in interpreting factors: Factors only identified up toinvertible linear transformations.

Generalized correlations close to 1 measure of how many factors twosets have in common.

⇒ Proxy factors approximate statistical factors. 31


Empirical Results

Interpreting factors: 6th proxy factor

6. Proxy RP-PCA Weights 6. Proxy PCA Weights

Momentum (6m) 1 0.28 Leverage 10 0.33Momentum (6m) 2 0.25 Asset Turnover 10 0.25Value (M) 10 0.25 Value-Profitability 10 0.25Value-Momentum 1 0.23 Profitability 10 0.22Industry Momentum 1 0.20 Asset Turnover 9 0.22Industry Reversals 9 0.19 Sales/Price 10 0.20Industry Momentum 2 0.19 Sales/Price 9 0.18Momentum (6m) 3 0.18 Size 10 0.17Idiosyncratic Volatility 2 -0.18 Value-Momentum-Profitability 1 -0.19Industry Mom. Reversals -0.18 Profitability 2 -0.19Value-Momentum 8 -0.20 Value-Profitability 1 -0.20Momentum (6m) 10 -0.21 Profitability 4 -0.20Value-Momentum 9 -0.23 Value-Profitability 2 -0.20Value-Momentum 10 -0.23 Profitability 1 -0.23Short-Term Reversals 1 -0.24 Idiosyncratic Volatility 1 -0.24Industry-Momentum 10 -0.24 Profitability 3 -0.25Industry Rel. Reversals 1 -0.28 Asset Turnover 2 -0.28Idiosyncratic Volatility 1 -0.38 Asset Turnover 1 -0.35

32


Empirical Results

Time-stability of loadings

Figure: Time-varying rotated loadings for the first six factors. Loadingsare estimated on a rolling window with 240 months. 33


Empirical Results

Time-stability of loadings of individual stocks

SR (In-sample)

RP-PCA PCA0

0.2

0.4

0.61 factor2 factors3 factors4 factors5 factors6 factors

SR (Out-of-sample)

RP-PCA PCA0

0.2

0.4

0.6

RMS (In-sample)

RP-PCA PCA0

0.1

0.2

0.3

0.4

RMS (Out-of-sample)

RP-PCA PCA0

0.1

0.2

0.3

0.4


RP-PCA PCA0

20

40

60


RP-PCA PCA0

20

40

60

Figure 19: Stock price data from 01/1972 to 12/2016 (N = 270 and T = 500): MaximalSharpe-ratios, root-mean-squared pricing errors and unexplained idiosyncratic variationfor di↵erent number of factors. RP-weight = 10.

25

Figure: Stock price data (N = 270 and T = 500): Maximal Sharpe-ratiosfor different number of factors. RP-weight γ = 10.

Stock price data from 01/1972 to 12/2016 (N = 270 and T = 500)

⇒ Out-of-sample performance collapses

⇒ Constant loading model inappropriate34


Empirical Results


Figure: Stock price data: Generalized correlations between loadingsestimated on the whole time horizon and a rolling window 35


Conclusion

Conclusion

Methodology

Estimator for estimating priced latent factors from large data sets

Combines variation and pricing criterion function

Asymptotic theory under weak and strong factor model assumption

Detects weak factors with high Sharpe-ratio

More efficient than conventional PCA

Empirical Results

Strongly dominates PCA of the covariance matrix.

Potential to provide benchmark factors for horse races.

RP-PCA factor investing outperforms benchmark models

36


Time-stability: Generalized correlations

0 50 100 150 200 250 300 350 400

Year

0

0.2

0.4

0.6

0.8

1G

ener

aliz

ed C

orre

latio

nRP-PCA (total vs. time-varying)

1st GC2nd GC3rd GC4th GC5th GC6th GC

0 50 100 150 200 250 300 350 400

Year

0

0.2

0.4

0.6

0.8

1

Gen

eral

ized

Cor

rela

tion

PCA (total vs. time-varying)


Figure: Generalized correlations between loadings estimated on the wholetime horizon T = 638 and a rolling window with 240 months for K = 6factors.

A 1



0 50 100 150 200 250 300Time

0

0.5

1

Gen

eral

ized

Cor

rela

tion RP-PCA (total vs. time-varying)


0 50 100 150 200 250 300Time

0

0.5

1

Gen

eral

ized

Cor

rela

tion PCA (total vs. time-varying)

Figure: Stock price data (N = 270 and T = 500): Generalizedcorrelations between loadings estimated on the whole time horizon and arolling window with 240 months.

A 2


Effect of Risk-Premium Weight γ

0 5 10 15 200

0.2

0.4SR

SR (In-sample)

1 factor2 factors3 factors

0 5 10 15 200

0.2

0.4

SR

SR (Out-of-sample)

0 5 10 15 200

0.1

0.2

0.3

RMS (In-sample)

0 5 10 15 200

0.1

0.2

0.3

RMS (Out-of-sample)

0 5 10 15 200

2

4

Varia

tion


0 5 10 15 200

2

4

Varia

tion


Figure: Maximal Sharpe-ratios, root-mean-squared pricing errors andunexplained idiosyncratic variation.

⇒ RP-PCA detects 3rd factor (accrual) for γ > 10.A 3


Literature (partial list)

Large-dimensional factor models with strong factors

Bai (2003): Distribution theoryFan et al. (2016): Projected PCA for time-varying loadingsKelly et al. (2017): Instrumented PCA for time-varyingloadingsPelger (2016), Aıt-Sahalia and Xiu (2015): High-frequency

Large-dimensional factor models with weak factors(based on random matrix theory)

Onatski (2012): Phase transition phenomenaBenauch-Georges and Nadakuditi (2011): Perturbation of largerandom matrices

Asset-pricing factors

Harvey and Liu (2015): Lucky factorsClarke (2015): Level, slope and curvature for stocksKozak, Nagel and Santosh (2015): PCA based factorsBryzgalova (2016): Spurious factors A 4


Extreme deciles of single-sorted portfolios

Portfolio Data

Monthly return data from 07/1963 to 12/2016 (T = 638) forN = 64 portfolios

Kozak, Nagel and Santosh (2017) data: 370 decile portfolios sortedaccording to 37 anomalies

⇒ Here we take only the lowest and highest decile portfolio for eachanomaly (N = 64).

Factors:

1 RP-PCA: K = 6 and γ = 100.2 PCA: K = 63 Fama-French 5: The five factor model of Fama-French

(market, size, value, investment and operating profitability, allfrom Kenneth French’s website).

4 Proxy factors: RP-PCA and PCA factors approximated with 8largest positions. A 5


Extreme Deciles

Anomaly Mean SD Sharpe-ratio Anomaly Mean SD Sharpe-ratio

Accruals - accrual 0.37 3.20 0.12 Momentum (12m) - mom12 1.28 6.91 0.19Asset Turnover - aturnover 0.40 3.84 0.10 Momentum-Reversals - momrev 0.47 4.82 0.10Cash Flows/Price - cfp 0.44 4.38 0.10 Net Operating Assets - noa 0.15 5.44 0.03Composite Issuance - ciss 0.46 3.31 0.14 Price - price 0.03 6.82 0.00Dividend/Price - divp 0.2 5.11 0.04 Gross Profitability - prof 0.36 3.41 0.11Earnings/Price - ep 0.57 4.76 0.12 Return on Assets (A) - roaa 0.21 4.07 0.05Gross Margins - gmargins 0.02 3.34 0.01 Return on Book Equity (A) - roea 0.08 4.40 0.02Asset Growth - growth 0.33 3.46 0.10 Seasonality - season 0.81 3.94 0.21Investment Growth - igrowth 0.37 2.69 0.14 Sales Growth - sgrowth 0.05 3.59 0.01Industry Momentum - indmom 0.49 6.17 0.08 Share Volume - shvol 0.00 6.00 0.00Industry Mom. Reversals - indmomrev 1.18 3.48 0.34 Size - size 0.29 4.81 0.06Industry Rel. Reversals - indrrev 1.00 4.11 0.24 Sales/Price sp 0.53 4.26 0.13Industry Rel. Rev. (L.V.) - indrrevlv 1.34 3.01 0.44 Short-Term Reversals - strev 0.36 5.27 0.07Investment/Assets - inv 0.49 3.09 0.16 Value-Momentum - valmom 0.51 5.05 0.10Investment/Capital - invcap 0.13 5.02 0.03 Value-Momentum-Prof. - valmomprof 0.84 4.85 0.17Idiosyncratic Volatility - ivol 0.56 7.22 0.08 Value-Profitability - valprof 0.76 3.84 0.20Leverage - lev 0.24 4.58 0.05 Value (A) - value 0.50 4.57 0.11Long Run Reversals - lrrev 0.46 5.02 0.09 Value (M) - valuem 0.43 5.89 0.07Momentum (6m) - mom 0.35 6.27 0.06

Table: Long-Short Portfolios of extreme deciles of 37 single-sortedportfolios from 07/1963 to 12/2016: Mean, standard deviation andSharpe-ratio.

A 6


Extreme Deciles


RP-PCA 0.64 0.18 3.59 0.53 0.15 4.23PCA 0.35 0.22 3.57 0.28 0.19 4.24

RP-PCA Proxy 0.62 0.19 4.08 0.48 0.17 4.19PCA Proxy 0.37 0.22 3.77 0.315 0.18 4.20

Fama-French 5 0.32 0.30 7.31 0.31 0.262 6.40

Table: First and last decile of 37 single-sorted portfolios from 07/1963 to12/2016 (N = 64 and T = 638): Maximal Sharpe-ratios,root-mean-squared pricing errors and unexplained idiosyncratic variation.K = 6 statistical factors.

RP-PCA strongly dominates PCA and Fama-French 5 factors

Results hold out-of-sample. A 7


Extreme Deciles: Maximal Sharpe-ratio

SR (In-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

0.1

0.2

0.3

0.4

0.5

0.6

0.73 factors4 factors5 factors6 factors7 factors

SR (Out-of-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

0.1

0.2

0.3

0.4

0.5

0.6

0.7


⇒ Spike in Sharpe-ratio for 6 factors

A 8


Extreme Deciles: Pricing error

RMS (In-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

0.05

0.1

0.15

0.2

0.25

0.3

0.35


RMS (Out-of-sample)

RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

0.05

0.1

0.15

0.2

0.25

0.3

0.35


⇒ RP-PCA has smaller out-of-sample pricing errors

A 9


Extreme Deciles: Idiosyncratic Variation


RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

1

2

3

4

5

6

7



RP-PCA

RP-PCA ProxyPCA

PCA Proxy0

1

2

3

4

5

6

7


⇒ Unexplained variation similar for RP-PCA and PCA

A 10


Extreme Deciles: Maximal Sharpe-ratio

SR (In-sample)

1 factor

2 factors

3 factors

4 factors

5 factors

6 factors

7 factors0

0.1

0.2

0.3

0.4

0.5

0.6

0.7=-1=0=1=10=50=100

SR (Out-of-sample)

1 factor

2 factors

3 factors

4 factors

5 factors

6 factors

7 factors0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure: First and last decile of 37 single-sorted portfolios (N = 64 andT = 638): Maximal Sharpe-ratios for different RP-weights γ and numberof factors K .

A 11


Interpreting factors: Generalized correlations with proxies

RP-PCA PCA

1. Gen. Corr. 1.00 1.002. Gen. Corr. 1.00 1.003. Gen. Corr. 0.98 0.994. Gen. Corr. 0.96 0.975. Gen. Corr. 0.88 0.956. Gen. Corr. 0.72 0.89

Table: Generalized correlations of statistical factors with proxy factors(portfolios of 8 assets).

Problem in interpreting factors: Factors only identified up toinvertible linear transformations.

Generalized correlations close to 1 measure of how many factors twosets have in common.

⇒ Proxy factors approximate statistical factors. A 12


Interpreting factors: Cumulative absolute proxy weights

RP-PCA Proxy PCA Proxy

Momentum (12m) 1.70 Idiosyncratic Volatility 1.28Idiosyncratic Volatility 1.65 Momentum (12m) 1.14Industry Rel. Rev. (L.V.) 1.21 Momentum (6m) 1.04Momentum (6m) 1.21 Price 0.95Price 1.21 Asset Turnover 0.83Industry Mom. Reversals 1.11 Industry Rel. Rev. (L.V.) 0.79Value (M) 1.03 Value (M) 0.75Industry Rel. Reversals 0.84 Industry Momentum 0.73Share Volume 0.55 Industry Mom. Reversals 0.73Net Operating Assets 0.47 Dividend/Price 0.67Value-Momentum-Prof. 0.46 Gross Profitability 0.64Size 0.42 Sales/Price 0.58Return on Book Equity (A) 0.32 Value-Profitability 0.58Industry Momentum 0.29 Net Operating Assets 0.37Value-Momentum 0.27 Value (A) 0.36Long Run Reversals 0.24 Value-Momentum-Prof. 0.33Earnings/Price 0.22 Cash Flows/Price 0.31

Table: Composition of proxy factors: Anomaly categories and the sum ofabsolute values of the portfolio weights of the K = 6 proxy factors.

A 13


Interpreting factors: Composition of proxies

2. Proxy (RP-PCA) 3. Proxy (RP-PCA) 4. Proxy (RP-PCA) 5. Proxy (RP-PCA) 6. Proxy (RP-PCA)

indrrevlv10 0.54 valmomprof10 0.17 mom1210 0.28 mom1210 -0.28 price1 0.38indmomrev10 0.52 indmomrev10 -0.20 mom10 0.26 mom10 -0.28 mom1 0.36

ivol10 0.24 ivol10 -0.21 valuem1 0.25 valmomprof10 -0.29 valuem10 0.34Accrual1 -0.21 mom121 -0.23 lrrev10 -0.24 roea1 -0.32 indrrev10 0.32

shvol1 -0.22 indrrevlv10 -0.26 mom1 -0.30 shvol1 -0.33 indrrev1 -0.26ep1 -0.22 indmomrev1 -0.40 valuem10 -0.44 price1 -0.37 valmom10 -0.27

indrrev1 -0.25 indrrevlv1 -0.41 price1 -0.45 size10 -0.42 indmom10 -0.29mom121 -0.42 ivol1 -0.67 mom121 -0.49 noa10 -0.47 ivol1 -0.53

2. Proxy (PCA) 3. Proxy (PCA) 4. Proxy (PCA) 5. Proxy (PCA) 6. Proxy (PCA)

ivol1 0.59 valuem10 0.46 mom10 0.36 divp10 0.30 valprof10 0.33indrrevlv10 0.43 price1 0.38 indmom10 0.35 roea1 -0.25 Aturnover10 0.32

indmomrev10 0.37 divp10 0.37 mom1210 0.34 shvol1 -0.27 sp10 0.27indrrevlv1 0.36 value10 0.36 valmomprof10 0.33 size10 -0.28 prof10 0.24

indmomrev1 0.36 sp10 0.32 valmom1 -0.31 mom1 -0.29 valprof1 -0.25ivol10 0.27 lrrev10 0.31 indmom1 -0.35 noa10 -0.37 prof1 -0.40

mom121 0.03 cfp10 0.31 mom121 -0.39 mom121 -0.38 ivol1 -0.42indmom1 0.03 valuem1 -0.29 mom1 -0.39 price1 -0.57 Aturnover1 -0.51

Table: Portfolio-composition of proxy factors for first and last decile of 37single-sorted portfolios: First proxy factors is an equally-weightedportfolio.

A 14


Extreme Deciles: Time-Stability

Figure: Time-varying rotated loadings for the first six factors. Loadingsare estimated on a rolling window with 240 months. A 15


Extreme Deciles: Time-Stability

0 50 100 150 200 250 300 350 400

Time

0

0.2

0.4

0.6

0.8

1

Gen

eral

ized

Cor

rela

tion

RP-PCA (total vs. time-varying, K=4)

1st GC2nd GC3rd GC4th GC

0 50 100 150 200 250 300 350 400

Time

0

0.2

0.4

0.6

0.8

1

Gen

eral

ized

Cor

rela

tion

PCA (total vs. time-varying, K=4)

1st GC2nd GC3rd GC4th GC

0 50 100 150 200 250 300 350 400

Time

0

0.2

0.4

0.6

0.8

1

Gen

eral

ized

Cor

rela

tion

RP-PCA (total vs. time-varying, K=6)


0 50 100 150 200 250 300 350 400

Time

0

0.2

0.4

0.6

0.8

1

Gen

eral

ized

Cor

rela

tion

PCA (total vs. time-varying, K=6)


Figure: Generalized correlations between loadings estimated on the wholetime horizon T = 638 and a rolling window. A 16


Double-sorted portfolios

Double-sorted portfolios

Data

Monthly return data from 07/1963 to 05/2017 (T = 647)13 double sorted portfolios (consisting of 25 portfolios) fromKenneth French’s website

Factors

1 PCA: K = 32 RP-PCA: K = 3 and γ = 1003 FF-Long/Short factors: market + two specific anomaly

long-short factors

A 17


Sharpe-ratios and pricing errors (in-sample)

Sharpe-Ratio αRPCA PCA FF-L/S RPCA PCA FF-L/S

Size and BM 0.25 0.22 0.21 0.13 0.13 0.14BM and Investment 0.24 0.17 0.24 0.08 0.11 0.12

BM and Profits 0.23 0.20 0.24 0.10 0.12 0.16Size and Accrual 0.33 0.13 0.21 0.06 0.14 0.12

Size and Beta 0.25 0.24 0.23 0.06 0.07 0.10Size and Investment 0.32 0.26 0.21 0.10 0.11 0.22

Size and Profits 0.22 0.21 0.25 0.06 0.06 0.16Size and Momentum 0.25 0.19 0.18 0.15 0.16 0.17Size and ST-Reversal 0.28 0.25 0.24 0.16 0.17 0.35

Size and Idio. Vol. 0.35 0.31 0.32 0.15 0.16 0.16Size and Total Vol. 0.34 0.30 0.31 0.16 0.16 0.16Profits and Invest. 0.28 0.24 0.30 0.10 0.12 0.11

Size and LT-Reversal 0.22 0.18 0.16 0.11 0.13 0.16

A 18


Sharpe-ratios and pricing errors (out-of-sample)

Sharpe-Ratio αRPCA PCA FF-L/S RPCA PCA FF-L/S

Size and BM 0.23 0.18 0.16 0.18 0.19 0.19BM and Investment 0.21 0.15 0.24 0.13 0.17 0.17

BM and Profits 0.19 0.17 0.23 0.16 0.17 0.19Size and Accrual 0.24 0.09 0.11 0.08 0.14 0.12

Size and Beta 0.22 0.20 0.16 0.08 0.09 0.09Size and Investment 0.30 0.23 0.17 0.13 0.14 0.16

Size and Profits 0.22 0.20 0.20 0.10 0.1 0.17Size and Momentum 0.18 0.12 0.08 0.17 0.18 0.19Size and ST-Reversal 0.22 0.17 0.23 0.22 0.23 0.25

Size and Idio. Vol. 0.37 0.30 0.28 0.17 0.18 0.18Size and Total Vol. 0.35 0.28 0.27 0.18 0.20 0.19Profits and Invest. 0.31 0.25 0.29 0.12 0.15 0.14

Size and LT-Reversal 0.16 0.10 0.04 0.13 0.14 0.14

A 19


The Model: Objective function

Time-series objective function:

Minimize the unexplained variance:

minΛ,F

1

NT

N∑i=1

T∑t=1

(Xti − FtΛ>i )2

= minΛ

1

NTtrace

((XMΛ)>(XMΛ)

)s.t. F = X (Λ>Λ)−1Λ>

Projection matrix MΛ = IN − Λ(Λ>Λ)−1Λ>

Error (non-systematic risk): e = X − FΛ> = XMΛ

Λ proportional to eigenvectors of the first K largest eigenvalues of1

NT X>X minimizes time-series objective function

⇒ Motivation for PCA

A 20



Cross-sectional objective function:

Minimize cross-sectional expected pricing error:

1

N

N∑i=1

(E [Xi ]− E [F ]Λ>i

)2

=1

N

N∑i=1

(1

TX>i 1−

1

T1>FΛ>i

)2

=1

Ntrace

((1

T1>XMΛ

)(1

T1>XMΛ

)>)s.t. F = X (Λ>Λ)−1Λ>

1 is vector T × 1 of 1’s and thus F>1T estimates factor mean

Why not estimate factors with cross-sectional objective function?

Factors not identifiedSpurious factor detection (Bryzgalova (2016))

A 21



Combined objective function: Risk-Premium-PCA

minΛ,F

1

NTtrace

(((XMΛ)>(XMΛ

))+ γ

1

Ntrace

((1

T1>XMΛ

)(1

T1>XMΛ

)>)

= minΛ

1

NTtrace

(MΛX

>(I +

γ

T11>)XMΛ

)s.t. F = X (Λ>Λ)−1Λ>

The objective function is minimized by the eigenvectors of thelargest eigenvalues of 1

NT X>(IT + γ

T 11>)X .

Λ estimator for loadings: proportional to eigenvectors of the first Keigenvalues of 1

NT X>(IT + γ

T 11>)X

F estimator for factors: 1NX Λ = X (Λ>Λ)−1Λ>.

Estimator for the common component C = FΛ is C = F Λ>

A 22



Weighted Combined objective function:

Straightforward extension to weighted objective function:

minΛ,F

1

NTtrace(Q>(X − FΛ>)>(X − FΛ>)Q)

+ γ1

Ntrace

(1>(X − FΛ>)QQ>(X − FΛ>)>1

)= min

Λtrace

(MΛQ

>X>(I +

γ

T11>)XQMΛ

)s.t. F = X (Λ>Λ)−1Λ>

Cross-sectional weighting matrix Q

Factors and loadings can be estimated by applying PCA toQ>X>

(I + γ

T 11>)XQ.

Today: Only Q equal to inverse of a diagonal matrix of standarddeviations. For γ = −1 corresponds to PCA of a correlation matrix.

Optimal choice of Q: GLS type argument A 23


Simulation

Simulation parameters

N = 250 and T = 350.

Factors: K = 4

1. Factor represent the market with N(1.2, 9): Sharpe-ratio of0.42. Factor represents an industry factors following N(0.1, 1):Sharpe-ratio of 0.1.3. Factor follows N(0.4, 1): Sharpe-ratio of 0.4.4. Factor has a small variance but high Sharpe-ratio. It followsN(0.4, 0.16): Sharpe-ratio of 1.

Loadings normalized such that 1N Λ>Λ = IK .

Λi,1 = 1 and Λi,k ∼ N(0, 1) for k = 2, 3, 4.

Errors: Cross-sectional and time-series correlation andheteroskedasticity in the residuals. 1/3 of variation due tosystematic risk. A 24



Errors

Residuals are modeled as e = σeDTAT εANDN :

ε is a T × N matrix and follows a multivariate standard normaldistribution

Time-series correlation in errors: AT creates an AR(1) model withparameter ρ = 0.1

Cross-sectional correlation in errors: AN is a Toeplitz-matrix with(β, β, β) on the right three off-diagonals with β = 0.7

Cross-sectional heteroskedasticity: DN is a diagonal matrix withindependent elements following N(1, 0.2)

Time-series heteroskedasticity: DT is a diagonal matrix withindependent elements following N(1, 0.2)

Signal-to-noise ratio: σ2e = 9

Parameters produce eigenvalues that are consistent with the data.

Around 1/3 of the variation is due to systematic risk and 2/3 isnon-systematic.

A 25


Simulation

0 100 200 300 400

-1000

0

1000

2000

3000

4000

5000

60001. Factor

0 100 200 300 400

0

200

400

600

800

1000

12002. Factor

0 100 200 300 400

-500

0

500

1000

1500

2000

25003. Factor

0 100 200 300 400

0

500

1000

1500

2000

2500

30004. Factor

True factor

RP-PCA =0

RP-PCA =10

RP-PCA =50

RP-PCA =100

PCA

Figure: Sample paths of the cumulative returns of the first four factorsand the estimated factor processes. N = 250 and T = 350.

A 26


Simulation

True Factor γ = −1 γ = 0 γ = 10 γ = 50 γ = 100

In-sampleSR 1.17 0.56 0.80 1.07 1.10 1.10α 0.42 0.57 0.51 0.42 0.41 0.41

Idio. Var. 23.77 22.728 22.784 22.912 22.923 22.925Ex. Var. 0.33 0.36 0.36 0.35 0.35 0.35

Out-of-sampleSR 1.17 0.60 0.79 0.92 0.92 0.93α 0.42 0.58 0.52 0.47 0.47 0.47

Idio. Var. 23.84 23.15 23.13 23.15 23.15 23.15Ex. Var. 0.33 0.35 0.35 0.35 0.35 0.35

Table: (i) Maximal Sharpe-ratio (SR), (ii) root-mean-squared pricingerror (α), unexplained variation and proportion of variation explained byK = 4 factors. N = 250 and T = 350.

A 27


Simulation

γ = −1 γ = 0 γ = 10 γ = 50 γ = 100

In-sample1. Factor 0.98 0.98 0.98 0.98 0.982. Factor 0.92 0.92 0.92 0.92 0.923. Factor 0.92 0.92 0.92 0.92 0.92

4. Factor 0.15 0.45 0.68 0.69 0.69

Out-of-sample1. Factor 0.94 0.97 0.98 0.98 0.982. Factor 0.79 0.88 0.92 0.92 0.923. Factor 0.77 0.86 0.91 0.92 0.92

4. Factor 0.17 0.46 0.68 0.69 0.69

Table: Correlation between estimated and true factors. The risk-premiumweight γ = −1 corresponds to PCA. N = 250 and T = 350.

A 28


Simulation


Consider only K = 1 factor

Sharpe-ratio =1, i.e. µF = σF

Different signal strength σ2F

Λi ∼ N(0, 1), i.e. normalized variance σ2F · N in weak factor model.

Same residuals as before

A 29


Simulation: Correlation, N=150, T=150

Correlation with true factor (In-sample)

2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1

=-1=0=10=50=100

True factor

Correlation with true factor (Out-of-sample)

2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1

Figure: Correlation of estimated with true factor.

⇒ For signal σ2 < 0.8 a value of γ ≥ 10 is optimal.

A 30


Simulation: Sharpe-ratio, N=150, T=150

SR (In-sample)

2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1

1.2

=-1=0=10=50=100

True factor

SR (Out-of-sample)

2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1

1.2



A 31


Simulation: Pricing Error, N=150, T=150

RMS (In-sample)

2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1

=-1=0=10=50=100

True factor

RMS (Out-of-sample)

2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1



A 32


Simulation: Idiosyncratic Variation, N=150, T=150


2=0.3 2=0.5 2=0.8 2=1.00

5

10

15

20

25

=-1=0=10=50=100

True factor


2=0.3 2=0.5 2=0.8 2=1.00

5

10

15

20

25


⇒ Unexplained variation similar for RP-PCA and PCA.

A 33




2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1

=-1=0=10=50=100

True factor


2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1



A 34




2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1

=-1=0=10=50=100

True factor


2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1



A 35




2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1

=-1=0=10=50=100

True factor


2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1



A 36




2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1

=-1=0=10=50=100

True factor


2=0.3 2=0.5 2=0.8 2=1.00

0.2

0.4

0.6

0.8

1



A 37


Simulation: Weak factor model prediction

0 0.5 1 1.5 20

0.5

1Statistical Model

PCARP-PCA ( =0)RP-PCA ( =10)RP-PCA ( =50)

0 0.5 1 1.5 20

0.5

1Monte-Carlo Simulation


Correlations between estimated and true factor based on the weak factor model

prediction and Monte-Carlo simulations for different variances of the factor.

The residuals have cross-sectional and time-series correlation and

heteroskedasticity. The Sharpe-ratio of the factor is 1. N = 250,T = 350. A 38


Simulation: Weak factor model prediction

0 0.5 1 1.5 20

0.5

1Statistical Model


0 0.5 1 1.5 20

0.5

1Monte-Carlo Simulation


Correlations between estimated and true factor based on the weak factor model

prediction and Monte-Carlo simulations for different variances of the factor.

The residuals have only cross-sectional correlation. The Sharpe-ratio of the

factor is 1. N = 250,T = 350. A 39


Weak Factor Model: Dependent residuals

0 5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

1

dependent residualsi.i.d residuals

Figure: Values of ρ2i ( 1

1+θiB(θi ))if θi > σ2

crit and 0 otherwise) for different

signals θi . The average noise level is normalized in both cases to σ2e = 1

and c = 1. For the correlated residuals we assume that Σ1/2 is a Toeplitzmatrix with β, β, β on the right three off-diagonals with β = 0.7.N = 250,T = 350.

A 40


Weak Factor Model

Weak Factor Model

Weak factors either have a small variance or affect a smallerfraction of assets:


Statistical model: Spiked covariance models from random matrixtheory

Eigenvalues of sample covariance matrix separate into two areas:

The bulk, majority of eigenvaluesThe extremes, a few large outliers

Bulk spectrum converges to generalized Marchenko-Pasturdistribution (under certain conditions)

A 41


Weak Factor Model

Weak Factor Model

Large eigenvalues converge either to

A biased value characterized by the Stieltjes transform of thebulk spectrumTo the bulk of the spectrum if the true eigenvalue is belowsome critical threshold

⇒ Phase transition phenomena: estimated eigenvectorsorthogonal to true eigenvectors if eigenvalues too small

Onatski (2012): Weak factor model with phase transitionphenomena

Problem: All models in the literature assume that random processeshave mean zero

⇒ RP-PCA implicitly uses non-zero means of random variables

⇒ New tools necessary!A 42


Weak Factor Model

Assumption 1: Weak Factor Model

1 Rate: Assume that NT→ c with 0 < c <∞.

2 Factors: F are uncorrelated among each other and are independent of eand Λ and have bounded first two moments.

µF :=1

T

T∑t=1

Ftp→ µF ΣF :=

1

TFtF

>t

p→ ΣF =

σ2F1· · · 0

.... . .

...0 · · · σ2

FK

3 Loadings: The column vectors of the loadings Λ are orthogonally

invariant and independent of ε and F (e.g. Λi,k ∼ N(0, 1N

) and

Λ>Λ = IK

4 Residuals: e = εΣ with εt,i ∼ N(0, 1). The empirical eigenvaluedistribution function of Σ converges to a non-random spectral distributionfunction with compact support and supremum of support b. Largesteigenvalues of Σ converge to b. A 43


Weak Factor Model

Intuition: Weak Factor Model

“Signal” matrix for PCA of covariance matrix:

ΛΣFΛ>

K largest eigenvalues θPCA1 , ..., θPCAK measure strength of signal

“Signal” matrix for RP-PCA:


>F

)Λ>

K largest eigenvalues θRP-PCA1 , ..., θRP-PCA

K measure strength of signal

If µF 6= 0 and γ > −1 then RP-PCA signal always larger than PCAsignal:

θRP-PCAi > θPCAi

A 44


Weak Factor Model


Under Assumption 1 the correlation of the estimated with the true factors is

Corr(F , F )p→ U︸︷︷︸

rotation

ρ1 0 · · · 00 ρ2 · · · 0

0 0. . .

...0 · · · 0 ρK

V︸︷︷︸rotation

with

ρ2i

p→ 1

1+θiB(θi ))if θi > θcrit

0 otherwise

Critical value θcrit and function B(.) depend only on the noise distributionand are known in closed-form

Based on closed-form expression choose optimal RP-weight γ

For θi > θcrit ρ2i is strictly increasing in θi .

⇒ RP-PCA strictly dominates PCA A 45


Strong Factor Model

Strong Factor Model

Strong factors affect most assets: e.g. market factor

1N Λ>Λ bounded (after normalizing factor variances)

Statistical model: Bai and Ng (2002) and Bai (2003) framework

Factors and loadings can be estimated consistently and areasymptotically normal distributed

RP-PCA provides a more efficient estimator of the loadings

Assumptions essentially identical to Bai (2003)

A 46


Strong Factor Model

Asymptotic Distribution (up to rotation)

PCA under assumptions of Bai (2003): (up to rotation)

Asymptotically Λ behaves like OLS regression of F on X .Asymptotically F behaves like OLS regression of Λ on X>.

RP-PCA under slightly stronger assumptions as in Bai (2003):

Asymptotically Λ behaves like OLS regression of FW on XW

with W 2 =(IT + γ 11

>

T

)and 1 is a T × 1 vector of 1’s .

Asymptotically F behaves like OLS regression of Λ on X .

Asymptotic Efficiency

Choose RP-weight γ to obtain smallest asymptotic variance of estimators

RP-PCA (i.e. γ > −1) always more efficient than PCA

Optimal γ typically smaller than optimal value from weak factor model

RP-PCA and PCA are both consistentA 47


Simplified Strong Factor Model

Assumption 2: Simplified Strong Factor Model

1 Rate: Same as in Assumption 1

2 Factors: Same as in Assumption 1

3 Loadings: Λ>Λ/Np→ IK and all loadings are bounded.

4 Residuals: e = εΣ with εt,i ∼ N(0, 1). All elements and all row sums ofΣ are bounded.

A 48


Simplified Strong Factor Model

Proposition: Simplified Strong Factor Model

Assumption 2 holds. Then:

1 The factors and loadings can be estimated consistently.

2 The asymptotic distribution of the factors is not affected by γ.

3 The asymptotic distribution of the loadings is given by

√T(

Λi − Λi

)D→ N(0,Ωi )

Ωi =σ2ei

(ΣF + (1 + γ)µFµ

>F

)−1 (ΣF + (1 + γ)2µFµ

>F

)(ΣF + (1 + γ)µFµ

>F

)−1, E [e2

t,i ] = σ2ei

4 γ = 0 is optimal choice for smallest asymptotic variance.γ = −1, i.e. the covariance matrix, is not efficient.

A 49



Model with time-varying loadings

Observe panel of excess returns and L covariates Zi,t−1,l :

Xt,i = Ft1×K

> gK×1

(Zi,t−1,1, ...,Zi,t−1,L) + et,i

Loadings are function of L covariates Zi,t−1,l with l = 1, ..., Le.g. characteristics like size, book-to-market ratio, past returns, ...

Factors and loading function are latent

Idea: Similar to Projected PCA (Fan, Liao and Wang (2016)) andInstrumented PCA (Kelly, Pruitt, Su (2017)), but

we include the pricing error penaltyallow for general interactions between covariates

A 50



Projected RP-PCA (work in progress)

Approximate nonlinear function gk(.) by basis functions φm(.):

gk(Zi ,t−1) =M∑

m=1

bm,kφm(Zi,t−1) g(Zt−1)︸︷︷︸K×N

= B>︸︷︷︸K×M

Φ(Zt−1)︸︷︷︸M×N

Apply RP-PCA to projected data Xt = XtΦ(Zt−1)>

Xt = FtB>Φ(Zt−1)Φ(Zt−1)> + etΦ(Zt−1)> = FtB + et

Special case: φm = 1Zt−1∈Im ⇒ X characteristics sorted portfolios

Obtain arbitrary interactions and break curse of dimensionality byconditional tree sorting projection

Intuition: Projection creates M portfolios sorted on any functionalform and interaction of covariates Zt−1. A 51


Weak Factor Model

Assumption 1: Weak Factor Model

1 Residual matrix can be represented as e = εΣ with εt,i ∼ N(0, 1). Theempirical eigenvalue distribution function of Σ converges to anon-random spectral distribution function with compact support. Thesupremum of the support is b.

2 The factors F are uncorrelated among each other and are independent ofe and Λ and have bounded first two moments.

µF :=1

T

T∑t=1

Ftp→ µF ΣF :=

1

TFtF

>t

p→ ΣF =

σ2F1· · · 0

.... . .

...0 · · · σ2

FK

3 The column vectors of the loadings Λ are orthogonally invariant and

independent of ε and F (e.g. Λi,k ∼ N(0, 1N

) and

Λ>Λ = IK

4 Assume that NT→ c with 0 < c <∞. A 52


Weak Factor Model

Definition: Weak Factor Model

Average idiosyncratic noise σ2e := trace(Σ)/N

Denote by λ1 ≥ λ2 ≥ ... ≥ λN the ordered eigenvalues of 1Te>e. The

Cauchy transform (also called Stieltjes transform) of the eigenvalues isthe almost sure limit:

G(z) := a.s. limT→∞

1

N

N∑i=1

1

z − λi= a.s. lim

T→∞

1

Ntrace

((zIN −

1

Te>e)

)−1

B-function

B(z) :=a.s. limT→∞

c

N

N∑i=1

λi

(z − λi )2

=a.s. limT→∞

c

Ntrace

(((zIN −

1

Te>e)

)−2(1

Te>e

))

A 53


Weak Factor Model

Estimator

Risk-premium PCA (RP-PCA): Apply PCA estimation to

Sγ := 1TX>

(IT + γ 11

>

T

)X

PCA : Apply PCA to estimated covariance matrix

S−1 := 1TX>

(IT − 11

>

T

)X , i.e. γ = −1.

⇒ PCA special case of RP-PCA

“Signal” Matrix for Covariance PCA

MVar = ΣF + cσ2e IK =

σ2F1

+ cσ2e · · · 0

.... . .

...0 · · · σ2

FK+ cσ2

e

⇒ Intuition: Largest K “true” eigenvalues of S−1.

A 54


Weak Factor Model

Lemma: Covariance PCA

Assumption 1 holds. Define the critical value σ2crit = limz↓b

1G(z)

. The first K

largest eigenvalues λi of S−1 satisfy for i = 1, ...,K

λip→

G−1

(1

σ2Fi

+cσ2e

)if σ2

Fi+ cσ2

e > σ2crit

b otherwise

The correlation between the estimated and true factors converges to

Corr(F , F )p→

%1 · · · 0...

. . ....

0 · · · %K

with

%2i

p→

1

1+(σ2Fi

+cσ2e )B(λi ))

if σ2Fi

+ cσ2e > σ2

crit

0 otherwise

A 55


Weak Factor Model

Corollary: Covariance PCA for i.i.d. errors

Assumption 1 holds, c ≥ 1 and et,i i.i.d. N(0, σ2e ). The largest K eigenvalues

of S−1 have the following limiting values:

λip→

σ2Fi

+σ2e

σ2Fi

(c + 1 + σ2e ) if σ2

Fi+ cσ2

e > σ2crit ⇔ σ2

F >√cσ2

e

σ2e (1 +

√c)2 otherwise

The correlation between the estimated and true factors converges to

Corr(F , F )p→

%1 · · · 0...

. . ....

0 · · · %K

with

%2i

p→

1− cσ4

eσ4Fi

1+cσ2

eσ2Fi

+σ4e

σ4Fi

(c2−c)if σ2

Fi+ cσ2

e > σ2crit

0 otherwise A 56


Weak Factor Model

“Signal” Matrix for RP-PCA

“Signal” Matrix for RP-PCA

MRP =

(ΣF + cσ2

e Σ1/2F µF (1 + γ)

µ>F Σ1/2F (1 + γ) (1 + γ)(µ>F µF + cσ2

2)

)

Define γ =√γ + 1− 1 and note that (1 + γ)2 = 1 + γ.

⇒ Projection on K demeaned factors and on mean operator.

Denote by θ1 ≥ ... ≥ θK+1 the eigenvalues of the “signal matrix” MRP

and by U the corresponding orthonormal eigenvectors :

U>MRP U =

θ1 · · · 0...

. . ....

0 · · · θK+1

⇒ Intuition: θ1, ..., θK+1 largest K + 1 “true” eigenvalues of Sγ .

A 57


Weak Factor Model


Assumption 1 holds. The first K largest eigenvalues θi i = 1, ...,K of Sγ satisfy

θip→

G−1

(1θi

)if θi > σ2

crit = limz↓b1

G(z)

b otherwise

The correlation of the estimated with the true factors converges to

Corr(F , F )p→(IK 0

)U︸︷︷︸

rotation

ρ1 0 · · · 00 ρ2 · · · 0

0 0. . .

...0 · · · 0 ρK0 · · · 0

D1/2K Σ

−1/2

F︸︷︷︸rotation

with

ρ2i

p→

1

1+θiB(θi ))if θi > σ2

crit

0 otherwiseA 58


Weak Factor Model

Theorem 1: continued

ΣF =D1/2K

(ρ1 · · · 0...

. . ....

0 · · · ρK0 · · · 0

>

U>(IK 00 0

)U

ρ1 · · · 0...

. . ....

0 · · · ρK0 · · · 0

+

1− ρ21 · · · 0

.... . .

...0 · · · 1− ρ2

K

)D1/2K

DK =diag((θ1 · · · θK

))

A 59


Weak Factor Model

Lemma: Detection of weak factors

If γ > −1 and µF 6= 0, then the first K eigenvalues of MRP are strictly largerthan the first K eigenvalues of MVar , i.e.

θi > σ2Fi + cσ2

e

For θi > σ2crit it holds that

∂θi∂θi

> 0∂ρi∂θi

> 0 i = 1, ...,K

Thus, if γ > −1 and µF 6= 0, then ρi > %i .

⇒ For µF 6= 0 RP-PCA always better than PCA.

A 60


Weak Factor Model

Example: One-factor model

Assume that there is only one factor, i.e. K = 1. The “signal matrix” MRP

simplifies to

MRP =

(σ2F + cσ2

e σFµ(1 + γ)µσF (1 + γ) (µ2 + cσ2

e )(1 + γ)

)and has the eigenvalues:

θ1,2 =1

2σ2F + cσ2

e + (µ2 + cσ2e )(1 + γ)

± 1

2

√(σ2

F + cσ2e + (µ2 + cσ2

e )(1 + γ))2 − 4(1 + γ)cσ2e (σ2

F + µ2 + cσ2e )

The eigenvector of first eigenvalue θ1 has the components

U1,1 =µσF (1 + γ)√

(θ1 − (σ2F + cσ2

e ))2 + µ2σ2F (1 + γ)

U1,2 =θ1 − σ2

F + cσ2e√

(θ1 − (σ2F + cσ2

e ))2 + µ2σ2F (1 + γ) A 61


Weak Factor Model

Corollary: One-factor model

The correlation between the estimated and true factor has the following limit:

Corr(F , F )p→ ρ1√

ρ21 + (1− ρ2

1)(θ1−(σ2

F+cσ2

e ))2+1

µ2σ2F

(1+γ)

A 62


Strong Factor Model

Strong Factor Model

Strong factors affect most assets: e.g. market factor1N


Statistical model: Bai and Ng (2002) and Bai (2003) framework

Factors and loadings can be estimated consistently and are asymptoticallynormal distributed

RP-PCA provides a more efficient estimator of the loadings

Assumptions essentially identical to Bai (2003)

A 63


Strong Factor Model

Asymptotic Distribution (up to rotation)

PCA under assumptions of Bai (2003):

Asymptotically Λ behaves like OLS regression of F on X .Asymptotically F behaves like OLS regression of Λ on X .

RP-PCA under slightly stronger assumptions as in Bai (2003):

Asymptotically Λ behaves like OLS regression of FW on XW

with W 2 =(IT + γ 11

>

T

).

Asymptotically F behaves like OLS regression of Λ on X .

Asymptotic Expansion

Asymptotic expansions (under slightly stronger assumptions as in Bai (2003)):

1√T(H>Λi − Λi

)=(

1TF>W 2F

)−1 1√TF>W 2ei + Op

(√TN

)+ op(1)

2√N(H>−1

Ft − Ft

)=(

1N

Λ>Λ)−1 1√

NΛ>e>t + Op

(√N

T

)+ op(1)

with known rotation matrix H. A 64


Strong Factor Model

Assumption 2: Strong Factor Model

Assume the same assumptions as in Bai (2003) (Assumption A-G) hold and inaddition (

1√T

∑Tt=1 Ftet,i

1√T

∑Tt=1 et,i

)D→ N(0,Ω) Ω =

(Ω1,1 Ω1,2

Ω2,1 Ω2,2

)

A 65


Strong Factor Model

Theorem 2: Strong Factor Model

Assumption 2 holds and γ ∈ [−1,∞). Then:

For any choice of γ the factors, loadings and common components can beestimated consistently pointwise.

If√

TN→ 0 then

√T(H>Λi − Λi

)D→ N(0,Φ)

Φ =(

ΣF + (γ + 1)µFµ>F

)−1 (Ω1,1 + γµFΩ2,1 + γΩ1,2µF + γ2µFΩ2,2µF

)·(

ΣF + (γ + 1)µFµ>F

)−1

For γ = −1 this simplifies to the conventional case Σ−1F Ω1,1Σ−1

F .

If√

NT→ 0 then the asymptotic distribution of the factors is not affected

by the choice of γ.

The asymptotic distribution of the common component depends on γ ifand only if N

Tdoes not go to zero. For T

N→ 0

√T(Ct,i − Ct,i

)D→ N

(0,F>t ΦFt

)A 66

Documents

Factor Investing using Penalized Principal Components · 08.02.2018 · 2 Uses information from large panel data sets: Many assets with many time observations 3 Searches for factors