43
Cleaning correlation matrices, Random Matrix Theory & HCIZ integrals J.P Bouchaud with: M. Potters, L. Laloux, R. Allez, J. Bun, S. Majumdar http://www.cfm.fr

Cleaning correlation matrices, Random Matrix Theory & HCIZ ...krzakala/WEBSITE_Cargese/SLIDES/Bouchaud.pdfJ.-P. Bouchaud An instanton approach to large N HCIZ • Consider Dyson’s

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Cleaning correlation matrices,

    Random Matrix Theory &

    HCIZ integrals

    J.P Bouchaud

    with: M. Potters, L. Laloux, R. Allez, J. Bun, S. Majumdar

    http://www.cfm.fr

  • Portfolio theory: Basics

    • Portfolio weights wi, Asset returns Xti

    • If expected/predicted gains are gi then the expected gain ofthe portfolio is

    G =∑

    i

    wigi

    • Let risk be defined as: variance of the portfolio returns(maybe not a good definition !)

    R2 =∑

    ij

    wiσiCijσjwj

    where σ2i is the variance of asset i, and

    Cij is the correlation matrix.

    J.-P. Bouchaud

  • Markowitz Optimization

    • Find the portfolio with maximum expected return for a givenrisk or equivalently, minimum risk for a given return (G)

    • In matrix notation:

    wC = GC−1g

    gTC−1gwhere all gains are measured with respect to the risk-free

    rate and σi = 1 (absorbed in gi).

    • Note: in the presence of non-linear contraints, e.g.∑

    i

    |wi| ≤ A

    a “spin-glass” problem! (see [JPB,Galluccio,Potters])

    J.-P. Bouchaud

  • Markowitz Optimization

    • More explicitly:

    w ∝∑

    αλ−1α (Ψα · g)Ψα = g +

    α(λ−1α − 1) (Ψα · g)Ψα

    • Compared to the naive allocation w ∝ g:

    – Eigenvectors with λ≫ 1 are projected out

    – Eigenvectors with λ≪ 1 are overallocated

    • Very important for “stat. arb.” strategies (for example)

    J.-P. Bouchaud

  • Empirical Correlation Matrix

    • Before inverting them, how should one estimate/clean cor-relation matrices?

    • Empirical Equal-Time Correlation Matrix E

    Eij =1

    T

    t

    XtiXtj

    σiσj

    Order N2 quantities estimated with NT datapoints.

    When T < N , E is not even invertible.

    Typically: N = 500−2000; T = 500−2500 days (10 years –Beware of high frequencies) −→ q := N/T = O(1)

    J.-P. Bouchaud

  • Risk of Optimized Portfolios

    • “In-sample” risk (for G = 1):

    R2in = wTEEwE =

    1

    gTE−1g

    • True minimal risk

    R2true = wTCCwC =

    1

    gTC−1g

    • “Out-of-sample” risk

    R2out = wTECwE =

    gTE−1CE−1g(gTE−1g)2

    J.-P. Bouchaud

  • Risk of Optimized Portfolios

    • Let E be a noisy, unbiased estimator of C. Using convexityarguments, and for large matrices:

    R2in ≤ R2true ≤ R2out

    • In fact, using RMT: R2out = R2true(1 − q)−1 = R2in(1 − q)−2,indep. of C! (For large N)

    • If C has some time dependence (beyond observation noise)one expects an even worse underestimation

    J.-P. Bouchaud

  • In Sample vs. Out of Sample

    0 10 20 30Risk

    0

    50

    100

    150R

    etur

    n

    Raw in-sample Cleaned in-sampleCleaned out-of-sampleRaw out-of-sample

    J.-P. Bouchaud

  • Rotational invariance hypothesis (RIH)

    • In the absence of any cogent prior on the eigenvectors, onecan assume that C is a member of a Rotationally Invariant

    Ensemble – “RIH”

    • Surely not true for the “market mode” ~v1 ≈ (1,1, . . . ,1)/√N ,

    with λ1 ≈ Nρ but OK in the bulk (see below)

    A more plausible assumption: factor model → hierarchical,block diagonal C’s (“Parisi matrices”)

    • “Cleaning” E within RIH: keep the eigenvectors, play witheigenvalues

    → The simplest, classical scheme, shrinkage:

    C = (1 − α)E + αI → λ̂C = (1 − α)λE + α, α ∈ [0,1]

    J.-P. Bouchaud

  • RMT: from ρC(λ) to ρE(λ)

    • Solution using different techniques (replicas, diagrams, freematrices) gives the resolvent GE(z) = N

    −1Tr(E − zI) as:

    GE(z) =∫dλ ρC(λ)

    1

    z − λ(1 − q+ qzGE(z)),

    Note: One should work from ρC −→ GE

    • Example 1: C = I (null hypothesis) → Marcenko-Pastur [67]

    ρE(λ) =

    √(λ+ − λ)(λ− λ−)

    2πqλ, λ ∈ [(1 −√q)2, (1 + √q)2]

    • Suggests a second cleaning scheme (Eigenvalue clipping, [Lalouxet al. 1997]): any eigenvalue beyond the Marcenko-Pastur

    edge can be trusted, the rest is noise.

    J.-P. Bouchaud

  • Eigenvalue clipping

    λ < λ+ are replaced by a unique one, so as to preserve TrC =

    N .

    J.-P. Bouchaud

  • RMT: from ρC(λ) to ρE(λ)

    • Solution using different techniques (replicas, diagrams, freematrices) gives the resolvent GE(z) as:

    GE(z) =∫dλ ρC(λ)

    1

    z − λ(1 − q+ qzGE(z)),

    Note: One should work from ρC −→ GE

    • Example 2: Power-law spectrum (motivated by data)

    ρC(λ) =µA

    (λ− λ0)1+µΘ(λ− λmin)

    • Suggests a third cleaning scheme (Eigenvalue substitution,Potters et al. 2009, El Karoui 2010): λE is replaced by the

    theoretical λC with the same rank k

    J.-P. Bouchaud

  • Empirical Correlation Matrix

    0 1 2 3 4 5λ

    0

    0.5

    1

    1.5

    ρ(λ)

    DataDressed power law (µ=2)Raw power law (µ=2)Marcenko-Pastur

    0 250 500rank

    -2

    0

    2

    4

    6

    8

    κMP and generalized MP fits of the spectrum

    J.-P. Bouchaud

  • Eigenvalue cleaning

    0 0.2 0.4 0.6 0.8 1α

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    R2

    Classical ShrinkageLedoit-Wolf ShrinkagePower Law SubstitutionEigenvalue Clipping

    Out-of sample risk for different 1-parameter cleaning schemes

    J.-P. Bouchaud

  • A RIH Bayesian approach

    • All the above schemes lack a rigorous framework and are atbest ad-hoc recipes

    • A Bayesian framework: suppose C belongs to a RIE, withP(C) and assume Gaussian returns. Then one needs:

    〈C〉|Xti =∫

    DCCP(C|{Xti})

    with

    P(C|{Xti}) = Z−1 exp[−NTrV (C, {Xti})

    ];

    where (Bayes):

    V (C, {Xti}) =1

    2q

    [logC + EC−1

    ]+ V0(C)

    J.-P. Bouchaud

  • A Bayesian approach: a fully soluble case

    • V0(C) = (1 + b) lnC + bC−1, b > 0: “Inverse Wishart”

    • ρC(λ) ∝√

    (λ+−λ)(λ−λ−)λ2

    ; λ± = (1 + b±√

    (1 + b)2 − b2/4)/b

    • In this case, the matrix integral can be done, leading exactlyto the “Shrinkage” recipe, with α = f(b, q)

    • Note that b can be determined from the empirical spectrumof E, using the generalized MP formula

    J.-P. Bouchaud

  • The general case: HCIZ integrals

    • A Coulomb gas approach: integrate over the orthogonalgroup C = OΛO†, where Λ is diagonal.

    ∫DO exp

    [

    −N2q

    Tr[logΛ + EO†Λ−1O + 2qV0(Λ)

    ]]

    • Can one obtain a large N estimate of the HCIZ integral

    F(ρA, ρB) = limN→∞

    N−2 ln∫

    DO exp[N

    2qTrAO†BO

    ]

    in terms of the spectrum of A and B?

    J.-P. Bouchaud

  • The general case: HCIZ integrals

    • Can one obtain a large N estimate of the HCIZ integral

    F(ρA, ρB) = limN→∞

    N−2 ln∫

    DO exp[N

    2qTrAO†BO

    ]

    in terms of the spectrum of A and B?

    • When A (or B) is of finite rank, such a formula exists in termsof the R-transform of B [Marinari, Parisi & Ritort, 1995].

    • When the rank of A,B are of order N , there is a formula dueto Matytsin [94](in the unitary case), later shown rigorously

    by Zeitouni & Guionnet, but its derivation is quite obscure...

    J.-P. Bouchaud

  • An instanton approach to large N HCIZ

    • Consider Dyson’s Brownian motion matrices. The eigenval-ues obey:

    dxi =

    √2

    βNdW +

    1

    Ndt∑

    j 6=i

    1

    xi − xj,

    • Constrain xi(t = 0) = λAi and xi(t = 1) = λBi. The proba-bility of such a path is given by a large deviation/instanton

    formula, with:

    d2xidt2

    = − 2N2

    ℓ 6=i

    1

    (xi − xℓ)3.

    J.-P. Bouchaud

  • An instanton approach to large N HCIZ

    • Constrain xi(t = 0) = λAi and xi(t = 1) = λBi. The proba-bility of such a path is given by a large deviation/instanton

    formula, with:

    d2xidt2

    = − 2N2

    ℓ 6=i

    1

    (xi − xℓ)3.

    • This can be interpreted as the motion of particles interactingthrough an attractive two-body potential φ(r) = −(Nr)−2.Using the virial formula, one finally gets Matytsin’s equations:

    ∂tρ+ ∂x[ρv] = 0, ∂tv+ v∂xv = π2ρ∂xρ.

    J.-P. Bouchaud

  • An instanton approach to large N HCIZ

    • Finally, the “action” associated to these trajectories is:

    S ≈ 12

    ∫dxρ

    [

    v2 +π2

    3ρ2]

    − 12

    [∫dxdyρZ(x)ρZ(y) ln |x− y|

    ]Z=B

    Z=A

    • Now, the link with HCIZ comes from noticing that the prop-agator of the Brownian motion in matrix space is:

    P(B|A) ∝ exp−[N2

    Tr(A−B)2] = exp−N2

    [TrA2+TrB2−2TrAOBO†]

    Disregarding the eigenvectors of B (i.e. integrating over O)

    leads to another expression for P(λBi|λAj) in terms of HCIZthat can be compared to the one using instantons

    • The final result for F(ρA, ρB) is exactly Matytsin’s expression,up to details (!)

    J.-P. Bouchaud

  • Back to eigenvalue cleaning...

    • Estimating HCIZ at large N is only the first step, but...

    • ...one still needs to apply it to B = C−1, A = E = X†CXand to compute also correlation functions such as

    〈O2ij〉E→C−1with the HCIZ weight

    • As we were working on this we discovered the work of Ledoit-Péché that solves the problem exactly using tools from RMT...

    J.-P. Bouchaud

  • The Ledoit-Péché “magic formula”

    • The Ledoit-Péché [2011] formula is a non-linear shrinkage,given by:

    λ̂C =λE

    |1 − q+ qλE limǫ→0GE(λE − iǫ)|2.

    • Note 1: Independent of C: only GE is needed (and is observ-able)!

    • Note 2: When applied to the case where C is inverse Wishart,this gives again the linear shrinkage

    • Note 3: Still to be done: reobtain these results using theHCIZ route (many interesting intermediate results to hope

    for!)

    J.-P. Bouchaud

  • Eigenvalue cleaning: Ledoit-Péché

    Fit of the empirical distribution with V ′0(z) = a/z+b/z2+c/z3.

    J.-P. Bouchaud

  • What about eigenvectors?

    • Up to now, most results using RMT focus on eigenvalues

    • What about eigenvectors? What natural null-hypothesis be-yond RIH?

    • Are eigen-values/eigen-directions stable in time?

    • Important source of risk for market/sector neutral portfolios:a sudden/gradual rotation of the top eigenvectors!

    • ..a little movie...

    J.-P. Bouchaud

  • What about eigenvectors?

    • Correlation matrices need a certain time T to be measured

    • Even if the “true” C is fixed, its empirical determinationfluctuates:

    Et = C + noise

    • What is the dynamics of the empirical eigenvectors inducedby measurement noise?

    • Can one detect a genuine evolution of these eigenvectorsbeyond noise effects?

    J.-P. Bouchaud

  • What about eigenvectors?

    • More generally, can one say something about the eigenvec-tors of randomly perturbed matrices:

    H = H0 + ǫH1

    where H0 is deterministic or random (e.g. GOE) and H1random.

    J.-P. Bouchaud

  • Eigenvectors exchange

    • An issue: upon pseudo-collisions of eigenvectors, eigenvaluesexchange

    • Example: 2 × 2 matrices

    H11 = a, H22 = a+ ǫ, H21 = H12 = c,−→

    λ± ≈ǫ→0 a+ǫ

    2±√

    c2 +ǫ2

    4

    • Let c vary: quasi-crossing for c→ 0, with an exchange of thetop eigenvector: (1,−1) → (1,1)

    • For large matrices, these exchanges are extremely numerous→ labelling problem

    J.-P. Bouchaud

  • Subspace stability

    • An idea: follow the subspace spanned by P -eigenvectors:

    |ψk+1〉, |ψk+2〉, . . . |ψk+P 〉 −→ |ψ′k+1〉, |ψ′k+2〉, . . . |ψ′k+P 〉

    • Form the P × P matrix of scalar products:

    Gij = 〈ψk+i|ψ′k+j〉

    • The determinant of this matrix is insensitive to label per-mutations and is a measure of the overlap between the two

    P -dimensional subspaces

    – D = −1P ln |detG| is a measure of how well the first sub-space can be approximated by the second

    J.-P. Bouchaud

  • Intermezzo

    • Non equal time correlation matrices

    Eτij =1

    T

    t

    XtiXt+τj

    σiσj

    N ×N but not symmetrical: ‘leader-lagger’ relations

    • General rectangular correlation matrices

    Gαi =1

    T

    T∑

    t=1

    Y tαXti

    N ‘input’ factors X; M ‘output’ factors Y

    – Example: Y tα = Xt+τj , N = M

    J.-P. Bouchaud

  • Intermezzo: Singular values

    • Singular values: Square root of the non zero eigenvaluesof GGT or GTG, with associated eigenvectors ukα and v

    ki →

    1 ≥ s1 > s2 > ...s(M,N)− ≥ 0

    • Interpretation: k = 1: best linear combination of input vari-ables with weights v1i , to optimally predict the linear com-

    bination of output variables with weights u1α, with a cross-

    correlation = s1.

    • s1: measure of the predictive power of the set of Xs withrespect to Y s

    • Other singular values: orthogonal, less predictive, linear com-binations

    J.-P. Bouchaud

  • Intermezzo: Benchmark

    • Null hypothesis: No correlations between Xs and Y s:

    Gtrue ≡ 0

    • But arbitrary correlations among Xs, CX, and Y s, CY , arepossible

    • Consider exact normalized principal components for the sam-ple variables Xs and Y s:

    X̂ti =1√λi

    j

    UijXtj; Ŷ

    tα = ...

    and define Ĝ = Ŷ X̂T .

    J.-P. Bouchaud

  • Intermezzo: Random SVD

    • Final result:([Wachter] (1980); [Laloux,Miceli,Potters,JPB])

    ρ(s) = (m+ n− 1)+δ(s− 1) +

    √(s2 − γ−)(γ+ − s2)

    πs(1 − s2)with

    γ± = n+m− 2mn± 2√mn(1 − n)(1 −m), 0 ≤ γ± ≤ 1

    • Analogue of the Marcenko-Pastur result for rectangular cor-relation matrices

    • Many applications; finance, econometrics (‘large’ models),genomics, etc. and subspace stability!

    J.-P. Bouchaud

  • Back to eigenvectors

    • Extend the target subspace to avoid edge effects:

    |ψk+1〉, |ψk+2〉, . . . |ψk+P 〉 −→ |ψ′k−Q+1〉, |ψ′k+2〉, . . . |ψ′k+Q〉

    • Form the P ×Q matrix of scalar products:

    Gij = 〈ψk+i|ψ′k+j〉

    • The singular values of G indicates how well the Q perturbedvectors approximate the initial ones

    D = −1P

    i

    ln si

    J.-P. Bouchaud

  • Null hypothesis

    • Note: if P and Q are large, D can be “accidentally” small

    • One can compute D exactly in the limit P,Q → ∞, N → ∞,with fixed p = P/N , q = Q/N :

    • Final result: (same problem as above!)

    D = −∫ 1

    0ds ln s ρ(s)

    with:

    ρ(s) =

    √(s2 − γ−)(γ+ − s2)

    πs(1 − s2)and

    γ± = p+ q − 2pq ± 2√pq(1 − p)(1 − q), 0 ≤ γ± ≤ 1

    J.-P. Bouchaud

  • Back to eigenvectors: perturbation theory

    • Consider a randomly perturbed matrix:

    H = H0 + ǫH1

    • Perturbation theory to second order in ǫ yields:

    D ≈ ǫ2

    2P

    i∈{k+1,...,k+P}

    j 6∈{k−Q+1,...,k+Q}

    (〈ψi|H1|ψj〉λi − λj

    )2.

    • The full distribution of s can again be computed exactly (insome limits) using free random matrix tools.

    J.-P. Bouchaud

  • GOE: the full SV spectrum

    • Initial eigenspace: spanned by [a, b] ⊂ [−2,2], b− a = ∆

    Target eigenspace: spanned by [a− δ, b+ δ] ⊂ [−2,2]

    • Two cases (set s = ǫ2ŝ):

    – Weak fluctuations ∆ ≪ δ ≪ 1 → ρ(ŝ) is a semi circlecentered around ∼ δ−1, of width ∼

    √∆

    – Strong fluctuations δ ≪ ∆ ≪ 1 → ρ(ŝ) ∼ ŝmin/ŝ2, withŝmin ∼ ∆−1 and ŝmax ∼ δ−1.

    J.-P. Bouchaud

  • The case of correlation matrices

    • Consider the empirical correlation matrix:

    E = C + η η =1

    T

    T∑

    t=1

    (XtXt − C)

    • The noise η is correlated as:〈ηijηkl

    〉=

    1

    T(CikCjl + CilCjk)

    • from which one derives:

    D ≈ 12TP

    P∑

    i=1

    N∑

    j=Q+1

    λiλj

    (λi − λj)2

    .

    (and a similar equation for eigenvalues)

    J.-P. Bouchaud

  • Stability of eigenvalues: Correlations

    Eigenvalues clearly change: well known correlation crises

    J.-P. Bouchaud

  • Stability of eigenspaces: Correlations

    D(τ) for a given T , P = 5, Q = 10

    J.-P. Bouchaud

  • Stability of eigenspaces: Correlations

    D(τ = T) for P = 5, Q = 10

    J.-P. Bouchaud

  • Conclusion

    • Many RMT tools available to understand the eigenvalue spec-trum and suggest cleaning schemes

    • The understanding of eigenvectors is comparatively poorer

    • The dynamics of the top eigenvector (aka market mode) isrelatively well understood

    • A plausible, realistic model for the “true” evolution of C isstill lacking (many crazy attempts – Multivariate GARCH,

    BEKK, etc., but “second generation models” are on their

    way)

    J.-P. Bouchaud

  • Bibliography

    • J.P. Bouchaud, M. Potters, Financial Applications of Ran-dom Matrix Theory: a short review, in “The Oxford Hand-

    book of Random Matrix Theory” (2011)

    • R. Allez and J.-P. Bouchaud, Eigenvectors dynamics: generaltheory & some applications, arXiv 1108.4258

    • P.-A. Reigneron, R. Allez and J.-P. Bouchaud, Principal re-gression analysis and the index leverage effect, Physica A,

    Volume 390 (2011) 3026-3035.

    J.-P. Bouchaud