Upload
others
View
21
Download
0
Embed Size (px)
Citation preview
Random Matrices and MultivariateStatistical Analysis
Iain Johnstone, Statistics, [email protected]
SEAâ06@MIT â p.1
Agenda⢠Classical multivariate techniques
⢠Principal Component Analysis⢠Canonical Correlations⢠Multivariate Regression
⢠Hypothesis Testing: Single and Double Wishart
⢠Eigenvalue densities
⢠Linear Statistics⢠Single Wishart⢠Double Wishart
⢠Largest Eigenvalue⢠Single Wishart⢠Double Wishart
⢠Concluding Remarks
SEAâ06@MIT â p.2
Classical Multivariate Statistics
Canonical methods are based on spectral decompositions:
One matrix (Wishart)
⢠Principal Component analysis
⢠Factor analysis
⢠Multidimensional scaling
Two matrices(independent Wisharts)
⢠Multivariate Analysis of Variance (MANOVA)
⢠Multivarate regression analysis
⢠Discriminant analysis
⢠Canonical correlation analysis
⢠Tests of equality of covariance matrices
SEAâ06@MIT â p.3
Gaussian data matrices
cases= =
variables.
Independent rows: ďż˝xi âź Np(0,ÎŁ), i = 1, . . . nor: X âź N(0, In â ÎŁp)
Zero mean â no centering in sample covariance matrix:
S = (Skkâ˛), S =1
nXT X, SkkⲠ=
1
n
nâi=1
xikxikâ˛
nS âź Wp(n,ÎŁ)
SEAâ06@MIT â p.4
Principal Components Analysis
Hotelling, 1933 X1, . . . , Xn âź Np(Âľ,ÎŁ),
Low dim. subspaceâexplaining most varianceâ:
li = max{uâ˛Su : uâ˛u = 1, uâ˛uj = 0, j < i}Eigenvalues of Wishart: A = nS âź Wp(n,ÎŁ):
Aui = liui l1 ⼠. . . ⼠lp ⼠0.
Key question: How many li areâsignificantâ?
0 50 100 1500
50
100
150
200
250
300"scree" plot of singular values of phoneme data
SEAâ06@MIT â p.5
Canonical Correlations(X1
Y1
)¡ ¡ ¡(
Xn
Yn
)jointly p + q â variate normal.
âMost predictable criterionâ: (Hotelling, 1935, 1936).
maxui,vi
Corr (uâ˛iX, vâ˛iY )
â Avi = r2i (A + B)vi, r2
1 ⼠. . . ⼠r2p.
Two independent Wishart distributions:
A âź Wp(q,ÎŁ), B âź Wp(n â q,ÎŁ).
SEAâ06@MIT â p.6
Multivariate Multiple Regression
Y = X B + UnĂp nĂq qĂp nĂp
U âź Np(0, I â ÎŁ)
n = # observations; p = # response variables; q = #predictor variables
P = X(XTX)â1XT
projection onspan{cols(X)}
Y T Y = Y T PY + Y T (I â P )Y
H : hypothesis SSP + E : error SSP
H âź Wp(q,ÎŁ) indep of E âź Wp(n â q,ÎŁ)
SEAâ06@MIT â p.7
Agenda⢠Classical multivariate techniques
⢠Hypothesis Testing: Single and Double Wishart
⢠Eigenvalue densities
⢠Linear Statistics
⢠Largest Eigenvalue
⢠Concluding Remarks
SEAâ06@MIT â p.8
Hypothesis Testing
Null hypothesis H0, nested within Alternative hypothesis HA
Test Statistics: functions of eigenvalues: T = T (l1, . . . , lp).
Null hypothesis distribution: P (T > t|H0 true).
RMT offers tools for
evaluation, and approximation based on p â â
Single Wishart A âź Wp(n, I) e-vals det(A â liI) = 0.Test H0 : ÎŁ = I (or ÎťI) versus HA : ÎŁ unrestricted.
Double Wishart H âź Wp(q,ÎŁ), E âź Wp(n â q,ÎŁ)independently. Eigenvalues det(H â li(E + H)) = 0.
Typical hypothesis test (e.g. from Y = XB + U):H0 : B = 0 versus HA : B unrestricted
SEAâ06@MIT â p.9
Likelihood Ratio Test
If X âź Np(0, Ip â ÎŁ), the density
fÎŁ(X) = det(2ĎÎŁ)ân/2 exp{â(n/2)trÎŁâ1S}Log likelihood ÎŁ â ďż˝(ÎŁ|X) =
log fÎŁ(X) = cnp â n2 log det ÎŁ â n
2 trÎŁâ1S
Maximum likelihood occurs at ÎŁ = S:maxÎŁ ďż˝(ÎŁ|X) = cnp â n
2 log det S
Likelihood ratio test of H0 : ÎŁ = I vs. HA : ÎŁ unrestricted:
log LR = maxÎŁâH0
ďż˝(ÎŁ|X) â maxÎŁâHA
ďż˝(ÎŁ|X)
= cnp + n2 (â
i
log li ââ
i
li)
Linear statistics in eigenvalues of S:â
i log li,â
i li.
SEAâ06@MIT â p.10
(Union-) Intersection Principle
Combine univariate test statistics:
H0 : ÎŁ = I â âŠ|a|=1H0a : aT ÎŁa = 1.
Var(aTX) = aT ÎŁa, so reject H0a if Var(aTX) = aT Sa > ca
Reject H0 â reject some H0a
â maxa
aT Sa > cmax
â lmax(S) > cmax
Summary:Likelihood ratio principle â linear statistics in eigenvaluesIntersection principle â extreme eigenvalues
SEAâ06@MIT â p.11
Agenda⢠Classical multivariate techniques
⢠Hypothesis Testing: Single and Double Wishart
⢠Eigenvalue densities
⢠Linear Statistics
⢠Largest Eigenvalue
⢠Concluding Remarks
SEAâ06@MIT â p.12
Eigenvalue densities - single Wishart
Statistics (n, p): câN
i=1 lnâpâ1
2
j eâli/2â
j<k |lj â lk|Laguerre OE (N,Îą): c
âpi=1 x
Îą2
j eâxi/2â
j<k |xj â xk|(N
Îą
)â
(p
n â p â 1
)p = #variables
n = sample size
Notation change has significance!
Statistics: no necessary relation between p and n;traditional approximation uses p fixed, n â â.
RMT: N â â with Îą fixed is most natural.(in Stat, fixing n â p would be less natural).
SEAâ06@MIT â p.13
Eigenvalue densities - double Wishart
Statistics: If H âź Wp(q, I) and E âź Wp(n â q, I) are indep,
then joint density of eigenvalues {ui} of H(H + E)â1 is
f(u) = c
pâi=1
u(qâpâ1)/2i (1 â ui)
(nâqâpâ1)/2mâ
i<j
(ui â uj).
With
âââ p
n â q â p
q â p
âââ â
âââN + 1
Îą
β
âââ , u = (1 + x)/2,
recover the Jacobi orthogonal ensemble
f(x) = cN+1âi=1
(1 â xi)(Îąâ1)/2(1 + xi)
(βâ1)/2N+1âi<j
|xi â xj |.
SEAâ06@MIT â p.14
Convergence of Empirical Spectra
For e-values {li}pi=1 Gp(t) = pâ1#{li ⤠t} â G(t) = g(t)dt.
Single Wishart (Marcenko-Pastur, 67) A âź Wp(n, I)If p/n â c > 0,
gMP (t) =
â(b+ â t)(t â bâ)
2Ďct, bÂą = (1 Âąâ
c)2.
Double Wishart (Wachter, 80) det(H â li(H + E)) = 0.
If p ⤠q, p/n â c = sin2(Îł/2) > 0, q/n â sin2(Ď/2),
gW (t) =
â(b+ â t)(t â bâ)
2Ďct(1 â t), bÂą = sin2(
Ď Âą Îł
2).
SEAâ06@MIT â p.15
Agenda⢠Classical multivariate techniques
⢠Hypothesis Testing: Single and Double Wishart
⢠Eigenvalue densities
⢠Linear Statistics⢠Single Wishart⢠Double Wishart
⢠Largest Eigenvalue
⢠Concluding Remarks
SEAâ06@MIT â p.16
Linear Statistics: Single Wishart
Approximate distributions:Statistics: ⢠Typically p fixed ; standard Ď2 approximation,
⢠improvements by âBartlett correctionâ
RMT: ⢠Central Limit Theorems (p large) for linear statisticsof eigenvalues. Large literature
Jonsson (1982): S âź Wp(n, I), p/n â c > 0 With
d(c) = (1 â câ1) log(1 â c) â 1,
log det S â pd(c)Dâ N(1
2 log(1 â c),â2 log(1 â c)) (1)
trS â pDâ N(0, 2c)
Surprise: quality of approximation in (1) for p small (e.g. 2!)
SEAâ06@MIT â p.17
Small p asymptotics
â4 â3 â2 â1 0 1 2 3 4â30
â25
â20
â15
â10
â5
0
5
Standard Normal Quantiles
Qua
ntile
s of
Inpu
t Sam
ple
QQ Plot of Sample Data versus Standard Normal
n p qtile pFix pBaiS
100 2 0.90 0.923 0.899
100 20 0.90 1.000 0.900
100 60 0.90 1.000 0.902
1000 20 0.90 0.990 0.900
100 2 0.95 0.965 0.951
100 20 0.95 1.000 0.951
100 60 0.95 1.000 0.949
1000 20 0.95 0.997 0.950
100 2 0.99 0.995 0.992
100 20 0.99 1.000 0.990
100 60 0.99 1.000 0.990
1000 20 0.99 1.000 0.990
SEAâ06@MIT â p.18
CLT for Likelihood Ratio distribution
Bai-Silverstein(2004)
pâ1
f(li) â p
âŤf(x)gMP (x)dx
Dâ Xf âź N(EXf ,Cov(Xf )),
Cov(Xf , Xg) =â1
2Ď2
âŤÎ1
âŤÎ2
f(z(m1))g(z(m2))
(m1 â m2)2dm1dm2
â CLT for null distribution of the LR test of H0 : ÎŁ = I,
pâ1
(log liâli+1)Dâ N(pd(c)+ 1
2 log(1âc), 2[log(1âc)â1âc]).
SEAâ06@MIT â p.19
Linear Statistics: Double Wishart
Hypothesis tests based on e-vals ui of H(H + E)â1, i.e.e-vals wi = ui/(1 â ui) of HEâ1.
Many standard tests are linear statistics SN (g) =âp
1 g(ui):
⢠Wilks Î: log Î =âp
1 log(1 â ui) [Likelihood ratio test]
⢠Pillaiâs trace =âp
1 ui
⢠Hotelling-Lawley trace =â
ui/(1 â ui) =âp
1 wi
⢠Royâs largest root = u(1).
Basor-Chen (05) Unitary case, formal; N â â Îą, β fixed.
SN (g) â (2N + Îą + β)agDâ N(0, bg¡g),
[ag = 12Ď
R 1â1
g(x)q1âx2
dx, bg¡g = 12Ď2
R 1â1
g(x)q1âx2
PR 1â1
q1ây2
yâxg(y)dydx]
SEAâ06@MIT â p.20
Agenda⢠Classical multivariate techniques
⢠Hypothesis Testing: Single and Double Wishart
⢠Eigenvalue densities
⢠Linear Statistics
⢠Largest Eigenvalue⢠Single Wishart⢠Double Wishart
⢠Concluding Remarks
SEAâ06@MIT â p.21
Largest Eigenvalue - Single Wishart
âUsualâ approach to maxima is (classically) infeasible:{l(1) ⤠x} =
âpi=1 I{li ⤠x}
Key role: determinants, not independence:âi<j
(li â lj) = det[lkâ1i ]1â¤i,kâ¤p
pâi=1
I{li ⤠x} =
pâk=0
(â1)k(
p
k
) kâi=1
I{li > x}.
¡ ¡ ¡ â P{ max1â¤iâ¤p
li ⤠t} =â
det(I â KpĎ[t,â))
Kp(x, y) is (2 Ă 2 matrix) kernel uses {Laguerre, Jacobi}orthogonal polynomials via Christoffel-Darboux summation.
SEAâ06@MIT â p.22
Tracy-Widom Limit
For real (β = 1, IMJ) or complex (β = 2, Johansson) data, ifn/p â c â (0,â):
Fp(s) = P{l1 ⤠¾np + Ďnps} â Fβ(s),
with
Âľnp = (â
n +â
p)2, Ďnp = (â
n +â
p)( 1â
n+
1âp
)1/3
El Karoui (2004) In complex case, for refined Âľâ˛np, Ď
â˛np,
|Fp(s) â F2(s)| ⤠Ceâspâ2/3.
Also, results for ⢠N â â, p â â separately, and⢠under alternative hypotheses.
SEAâ06@MIT â p.23
Painleve II and Tracy-Widom
Painleve II:
-2 -1 1 2
0.2
0.4
0.6
0.8
Tracy-Widom distributions:
-4 -2 0 2 4
qâ˛â˛ = xq + 2q3
q(x) âź Ai(x) as x â â
F2(s) =
exp{â⍠â
s(x â s)q2(x)dx}
F1(s) =
(F2(s))1/2 exp{â12
⍠â
sq(x)dx}.
SEAâ06@MIT â p.24
Largest Root - Double Wishart
Assume p, q(p), n(p) â â.
Îłp
2= sinâ1
rpâ.5
nâ1,
Ďp
2= sinâ1
rq â .5
n â 1.
¾¹ = cos2â Ď
2â Ďp Âą Îłp
2
â, Ď3
p+ =1
(2n â 2)2sin4(Ďp + Îłp)
sin Ďp sin Îłp.
Simply,u1 â Âľ+
Ď+
Dâ W1 âź F1
More precisely, logit transform ďż˝(u) = log(u/(1 â u)) : (IMJ,PJF)
|P{ďż˝(u1) ⤠�(Âľ+) + sĎ+ďż˝â˛(Âľ+)} â F1(s)| ⤠Ceâs/4pâ2/3
⢠corrections (.5,1,2) improve approxân for p, q small,
⢠â error is O(pâ2/3) [ instead of O(pâ1/3)]
SEAâ06@MIT â p.25
Approximation vs. Tables for p = 5
Tables: William Chen, IRS,
(2002)
mc =q â p â 1
2â [0, 15],
nc =n â q â p â 1
2â [1, 1000]
0 5 10 150
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
mc
Squ
ared
Cor
rela
tion
nc = 2
Table vs. Approx at 95th %tile; mc = (qâpâ1)/2; nc = (nâqâpâ1)/2
nc = 5
nc = 10
nc = 20
nc = 40
nc = 100
nc = 500
TracyâWidomChen Table p=5
SEAâ06@MIT â p.26
Remarks
⢠pâ2/3 scale of variability for u1
⢠95th %tile.= Âľp+ + Ďp+, 99th %tile
.= Âľp+ + 2Ďp+
⢠if Âľp+ > .7, logit scale vi = log ui/(1 â ui) better.
⢠Smallest eigenvalue: with previous assumptions and
Îł0 < Ď0, Ď3pâ = 1
(2nâ2)2sin4(ĎpâÎłp)sin Ďp sin Îłp
then
Âľpâ â up
ĎpâDâ W1 (W2)
⢠Corresponding limit distributions for u2 ⼠¡ ¡ ¡ ⼠uk,upâk ⼠¡ ¡ ¡ ⼠upâ1, k fixed
SEAâ06@MIT â p.27
Agenda⢠Classical multivariate techniques
⢠Hypothesis Testing: Single and Double Wishart
⢠Eigenvalue densities
⢠Linear Statistics
⢠Largest Eigenvalue
⢠Concluding Remarks
SEAâ06@MIT â p.28
Concluding Remarks
Numerous other topics deserve attention:
⢠distributions under alternative hypotheses: integralrepresentations, matrix hypergeometric functions
⢠empirical distributions and graphical display (Wachter)
⢠computational advances: (Dumitriu, Edelman, Koev,Rao)⢠operations on random matrices⢠multivariate orthogonal polynomials⢠matrix hypergeometric functions
⢠estimation and testing for eigenvectors (Paul)
⢠technical role for RMT in other statistical areas: e.g. vialarge deviations results
SEAâ06@MIT â p.29
Back-Up Slides
SEAâ06@MIT â p.30
Upper Bound in SAS
Approximate nâqq
u1
1âu1by Fq,nâq
0 5 10 150
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
mc
Squ
ared
Cor
rela
tion
nc = 2
Table vs. Approx at 95th %tile; mc = (qâpâ1)/2; nc = (nâqâpâ1)/2
nc = 5
nc = 10
nc = 20
nc = 40
nc = 100
nc = 500
TracyâWidomChen Table p=5SAS Fâapprox
SEAâ06@MIT â p.31
Testing Subsequent Correlations
Suppose: ÎŁXY =
âĄâ˘âŁĎ21 0 ¡ ¡ ¡ 0
. . ....
...
Ď2p 0 ¡ ¡ ¡ 0
â¤âĽâŚ p ⤠q, nâp
If largest r correlations are large, test
Hr :Ďr+1 = Ďr+2 = . . . = Ďp = 0?
Comparison Lemma (from SVD interlacing)
L(ur+1|p, q, n;SXY â Hr)st< L(u1|p, q â r, n; I)
â conservative Pâvalues for Hr via
TW (p, q â r, n) approxân to RHS
[Aside: L(u1|pâ r, q â r, n; I) may be better, but no bounds]SEAâ06@MIT â p.32