15
1 Signal Processing in Large Systems: A New Paradigm Romain Couillet 1 and M´ erouane Debbah 2 1 Telecommunication Department, Sup´ elec, Gif sur Yvette, France. 2 Alcatel Lucent Chair on Flexible Radio, Gif sur Yvette, Sup´ elec, France. Abstract—For a long time, detection and parameter estimation methods for signal processing have relied on asymptotic statistics as the number n of observations of a population grows large comparatively to the population size N , i.e. n/N →∞. Modern technological and societal advances now demand the study of sometimes extremely large populations and simultaneously require fast signal processing due to accelerated system dynamics. This results in not-so-large practical ratios n/N , sometimes even smaller than one. A disruptive change in classical signal processing methods has therefore been initiated in the past ten years, mostly spurred by the field of large dimensional random matrix theory. The early works in random matrix theory for signal processing applications are however scarce and highly technical. This tutorial provides an accessible methodological introduction to the modern tools of random matrix theory and to the signal processing methods derived from them, with an emphasis on simple illustrative examples. I. I NTRODUCTION Most signal processing methods, e.g. statistical tests or parameter estimators, are based on asymptotic statistics of n observed random signal vectors [1]. This is because a deter- ministic behavior often arises when n →∞, which simplifies the problem as exemplified by the celebrated law of large numbers and the central limit theorem. With the increase of the systems dimension, denoted by N , and the need for even faster dynamics, the scenario n N becomes less and less likely to occur in practice. As a consequence, the large n properties are no longer valid, as will be discussed in Section II. This is the case for instance in array processing where a large number of antennas is used to detect incoming signals with short time stationarity properties (e.g. radar detection of fast moving objects) [2]. Other examples are found for instance in mathematical finance where numerous stocks show price index correlation over short observable time periods [3] or in evolutionary biology where the joint presence of multiple genes in the genotype of a given species is analyzed from a few DNA samples [4]. This tutorial introduces recent tools to cope with the n N limitation of classical approaches. These tools are based on the study of large dimensional random matrices, which originates from the works of Wigner [5] in 1955 and, more importantly here, of Mar˘ cenko and Pastur [6] in 1967. Section III presents a brief introduction to random matrix theory. More precisely, we will explain the basic tools used in this field which differ greatly from classical approaches, based on which we will in- troduce statistical inference tools in the n N regime. These tools use random matrix theory jointly with complex analysis methods [7] and are often known as G-estimation, named after the G-estimator formulas from Girko [8]. Since the techniques presented in Section III are expected to be rather new to the non-expert reader, we will elaborate on toy examples to introduce the methods rather than on a list of theoretical results. More realistic application examples, taken from the still limited literature, are then presented in Section IV. These will further familiarize the reader with the introduced concepts. These examples span from signal detection in array processing to failure localisation in large dimensional networks. Finally, in Section V, we will draw some conclusions and provide future prospects of this emerging field. Notations: Boldface lowercase (uppercase) characters stand for vectors (matrices), with I N the N × N identity matrix. The notations (·) T and (·) H denote transpose and Hermitian transpose, respectively. The value -1 is denoted by ı. The notation E is the expectation operator. The notations a.s. -→ and denote almost sure and weak convergence of random variables, respectively. The symbol N(μ, σ 2 ) indicates a real Gaussian distribution with mean μ and variance σ 2 . The norm k·k will be understood as the Euclidean norm for vectors and the spectral norm for Hermitian matrices (kXk = max i |λ i | for X with eigenvalues λ 1 ,...,λ N ). The Dirac delta is denoted δ(x). Finally, (x) + = max(x, 0) for real x, 1 xy is the function of x equal to zero for x>y and equal to one for x y. II. LIMITATIONS OF CLASSICAL SIGNAL PROCESSING We start our exposition with a simple example explaining the compelling need for the study of large dimensional random matrices. A. The Mar˘ cenko-Pastur law Consider y = T 1 2 x C N a random variable with x having complex independent and identically distributed (i.i.d.) entries with zero mean and unit variance. If y 1 ,..., y n are n independent realizations of y, then, from the strong law of large numbers, 1 n n X i=1 y i y H i - T a.s. -→ 0 (1) as n →∞. For further use, we will denote Y =[y 1 ,..., y n ] with (i, j ) entry Y ij , and realize that with this notation, ˆ T , 1 n YY H = 1 n n X i=1 y i y H i . arXiv:1105.0060v3 [cs.IT] 19 Jun 2012

Signal Processing in Large Systems: A New Paradigm

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Signal Processing in Large Systems: A New Paradigm

1

Signal Processing in Large Systems:A New Paradigm

Romain Couillet1 and Merouane Debbah21 Telecommunication Department, Supelec, Gif sur Yvette, France.

2 Alcatel Lucent Chair on Flexible Radio, Gif sur Yvette, Supelec, France.

Abstract—For a long time, detection and parameter estimationmethods for signal processing have relied on asymptotic statisticsas the number n of observations of a population grows largecomparatively to the population size N , i.e. n/N → ∞. Moderntechnological and societal advances now demand the studyof sometimes extremely large populations and simultaneouslyrequire fast signal processing due to accelerated system dynamics.This results in not-so-large practical ratios n/N , sometimeseven smaller than one. A disruptive change in classical signalprocessing methods has therefore been initiated in the past tenyears, mostly spurred by the field of large dimensional randommatrix theory. The early works in random matrix theory forsignal processing applications are however scarce and highlytechnical. This tutorial provides an accessible methodologicalintroduction to the modern tools of random matrix theory andto the signal processing methods derived from them, with anemphasis on simple illustrative examples.

I. INTRODUCTION

Most signal processing methods, e.g. statistical tests orparameter estimators, are based on asymptotic statistics of nobserved random signal vectors [1]. This is because a deter-ministic behavior often arises when n→∞, which simplifiesthe problem as exemplified by the celebrated law of largenumbers and the central limit theorem. With the increase of thesystems dimension, denoted by N , and the need for even fasterdynamics, the scenario n N becomes less and less likelyto occur in practice. As a consequence, the large n propertiesare no longer valid, as will be discussed in Section II. Thisis the case for instance in array processing where a largenumber of antennas is used to detect incoming signals withshort time stationarity properties (e.g. radar detection of fastmoving objects) [2]. Other examples are found for instancein mathematical finance where numerous stocks show priceindex correlation over short observable time periods [3] orin evolutionary biology where the joint presence of multiplegenes in the genotype of a given species is analyzed from afew DNA samples [4].

This tutorial introduces recent tools to cope with the n ' Nlimitation of classical approaches. These tools are based on thestudy of large dimensional random matrices, which originatesfrom the works of Wigner [5] in 1955 and, more importantlyhere, of Marcenko and Pastur [6] in 1967. Section III presentsa brief introduction to random matrix theory. More precisely,we will explain the basic tools used in this field which differgreatly from classical approaches, based on which we will in-troduce statistical inference tools in the n ' N regime. Thesetools use random matrix theory jointly with complex analysis

methods [7] and are often known as G-estimation, named afterthe G-estimator formulas from Girko [8]. Since the techniquespresented in Section III are expected to be rather new tothe non-expert reader, we will elaborate on toy examplesto introduce the methods rather than on a list of theoreticalresults. More realistic application examples, taken from thestill limited literature, are then presented in Section IV. Thesewill further familiarize the reader with the introduced concepts.These examples span from signal detection in array processingto failure localisation in large dimensional networks. Finally,in Section V, we will draw some conclusions and providefuture prospects of this emerging field.

Notations: Boldface lowercase (uppercase) characters standfor vectors (matrices), with IN the N × N identity matrix.The notations (·)T and (·)H denote transpose and Hermitiantranspose, respectively. The value

√−1 is denoted by ı. The

notation E is the expectation operator. The notations a.s.−→and ⇒ denote almost sure and weak convergence of randomvariables, respectively. The symbol N(µ, σ2) indicates a realGaussian distribution with mean µ and variance σ2. The norm‖ · ‖ will be understood as the Euclidean norm for vectors andthe spectral norm for Hermitian matrices (‖X‖ = maxi |λi| forX with eigenvalues λ1, . . . , λN ). The Dirac delta is denotedδ(x). Finally, (x)+ = max(x, 0) for real x, 1x≤y is thefunction of x equal to zero for x > y and equal to one forx ≤ y.

II. LIMITATIONS OF CLASSICAL SIGNAL PROCESSING

We start our exposition with a simple example explainingthe compelling need for the study of large dimensional randommatrices.

A. The Marcenko-Pastur lawConsider y = T

12 x ∈ CN a random variable with x

having complex independent and identically distributed (i.i.d.)entries with zero mean and unit variance. If y1, . . . ,yn aren independent realizations of y, then, from the strong law oflarge numbers, ∥∥∥∥∥ 1

n

n∑i=1

yiyHi −T

∥∥∥∥∥ a.s.−→ 0 (1)

as n→∞. For further use, we will denote Y = [y1, . . . ,yn]with (i, j) entry Yij , and realize that with this notation,

T ,1

nYYH =

1

n

n∑i=1

yiyHi .

arX

iv:1

105.

0060

v3 [

cs.I

T]

19

Jun

2012

Page 2: Signal Processing in Large Systems: A New Paradigm

2

Here T is often called the sample covariance matrix,obtained from the observations y1, . . . ,yn of the randomvariable y with population covariance matrix E[yyH] = T.The convergence (1) suggests that, for n sufficiently large, Tis an accurate estimate of T. The natural reason why T canbe asymptotically well approximated is because T originatesfrom a total of Nn N2 observations, while the parameterT to be estimated is of size N2.

Suppose now for simplicity that T = IN . If the relationNn N2 is not met in practice, e.g. if n cannot be takenvery large compared to the system size N , then a peculiarbehaviour of T arises, as both N,n → ∞ while N/n doesnot tend to zero. Observe that the entry (i, j) of T is givenby

[T]ij =1

n

n∑k=1

YikY∗jk.

Since T = IN , E[YikY∗jk] = δij . Therefore, the law of large

numbers ensures that [T]ij → 0 if i 6= j, while [T]ii → 1,for all i, j. Since N grows along with n, we might then saythat T converges point-wise to an “infinitely large identitymatrix”. This stands, irrelevant of N , as long as n → ∞, soin particular if N = 2n→∞. However, under this hypothesis,T is of maximum rank N/2 and is therefore rank-deficient.1

Altogether, we may then conclude that T converges point-wiseto an “infinite-size identity matrix” which has the propertythat half of its eigenvalues equal zero. The convergence (1)therefore clearly does not hold here as the spectral norm (orthe absolute largest eigenvalue) of T− IN is greater or equalto 1 for all N .

This convergence paradox (convergence of T to an identitymatrix with zero eigenvalues) arises obviously from an in-correct reasoning when taking the limits N,n → ∞. Indeed,the convergence of matrices with increasing sizes only makessense if a proper measure, here the spectral norm, in the spaceof such matrices is considered. The outcome of our previousobservation is then that the random (or empirical) eigenvaluedistribution F T of T, defined as

F T(x) =1

N

N∑i=1

1x≤λi

with λ1, . . . , λN the eigenvalues of T, does not converge(weakly) to a Dirac mass in 1, and this is all what can be saidat this point. This remark has fundamental consequences forbasic signal processing detection and estimation procedures.In fact, although the eigenvalue distribution of T does notconverge to a Dirac in 1, it turns out that in many situationsof practical interest, it does converge towards a limitingdistribution, as N,n → ∞, N/n → c ≥ 0. In the particularcase where T = IN , the limiting distribution is known asthe Marcenko-Pastur law, initially studied by Marcenko andPastur in [6]. This distribution is depicted in Figure 1 andcompared to the empirical eigenvalue distribution of a matrixT of large dimensions. From this figure, we observe that theempirical distribution of the eigenvalues is a good match to

1Indeed, T = 2N

∑N/2i=1 yiy

Hi , which is the sum of N/2 rank-1 matrices.

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

Eigenvalues of T

Den

sity

Empirical eigenvaluesMarcenko-Pastur density

Fig. 1. Histogram of the eigenvalues of a single realization of T =1n

∑nk=1 yky

Hk , yk has proper complex Gaussian entries, for n = 2000,

N = 500.

the limiting distribution for N,n large. It is also observed thatno eigenvalue seems to lie outside the support of the limitingdistribution even for finite N,n. This fundamental propertycan be proved to hold for any matrix Y with independentzero mean and unit variance entries, along with some mildassumptions on the higher order moments. The shape of theMarcenko-Pastur law for different limiting ratios c is depictedin Figure 2. This figure suggests that, as c → 0, the supportof the Marcenko-Pastur law tends to concentrate into a singlemass in 1. This is in line with our expectations from thediscussion above for finite N , while the support tends to spreadand eventually reaches 0 when c is large, which is compliantwith the existence of a mass of eigenvalues at 0 when N > n,i.e. c > 1. This will be confirmed by the explicit expressionof the limiting density given later in Equation 7.

The general tools employed to derive the Marcenko-Pasturlaw and its generalizations to some more advanced randommatrix models are briefly introduced in Section III.

The main implication of the above observations on signalprocessing methods is that, in general, the tools developed inview of applications for small N and large n are no longeradequate if either N is taken much larger, or if n cannot beafforded to be large. Observe for instance in Figure 2 that,even for n being ten times larger than N , the spread of theMarcenko-Pastur law around 1 is significant enough for theclassical large n assumption to be quite disputable. In theremainder of this section, we introduce a practical inferenceproblem which assumes the setup n N and which canbe proved asymptotically biased if N/n does not converge tozero.

B. An eigenvalue inference problem

The following example deals with the statistical inference ofeigenvalues of a large population covariance matrix. This prob-lem is very generic and finds applications in array processing,such as radar detection, where the objective is to enumeratethe sources and to infer their distances to the radar, or in

Page 3: Signal Processing in Large Systems: A New Paradigm

3

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

x

Den

sityfc(x

)c = 0.1

c = 0.2

c = 0.5

Fig. 2. Marcenko-Pastur density fc for different limit ratios c = limN/n.

cognitive radios where the objective is to detect and estimatethe power of concurrent signal transmissions. This examplewill be used throughout the article to demonstrate the transitionfrom the classical signal processing considerations towards theadvanced random matrix approaches.

We assume that one has access to successive independent re-alizations y1, . . . ,yn of a stationary process yt = Uxt ∈ CN ,with U ∈ CN×N an unknown unitary matrix and xt ∈ CNa complex circularly symmetric Gaussian vector with zeromean and covariance matrix P. We further assume that Pis diagonal and composed of Ni eigenvalues equal to Pi, for1 ≤ i ≤ K, with P1 < . . . < PK , and where N =

∑Ki=1Ni.

The presence of the unknown matrix U translates the factthat, in many applications, the eigenvector structure of thepopulation covariance matrix T = UPUH of yt is notnecessarily known. Hence, the diagonal entries of P are notdirectly accessible. Note that this is in sharp contrast tocovariance matrix estimation of structured signals, as in therecent works [9], [10] where population covariance matriceshave a Toeplitz-like structure. Here, no such assumption onU is made, so that alternative techniques need to be used inorder to estimate the entries of P.

Our objective is to infer the values of P1, . . . , PK in thesetting where both n and N are of similar order of magnitudeor, stated in signal processing terms, when the number ofobservations of the process yt is not large compared to thesystem dimension N . As before, for readability, we denoteY = [y1, . . . ,yn] and recall that 1

nYYH is the samplecovariance matrix of these data.

Let us first consider the classical approach which assumesthat n N . We then have from a direct application of thelaw of large numbers that∥∥∥∥ 1

nYYH −UPUH

∥∥∥∥ a.s.−→ 0 (2)

as n → ∞. Since the convergence is in spectral norm andsince the eigenvalues of UPUH are the same as those of P,

an (asymptotically) n-consistent estimate P∞k for Pk reads

P∞k =1

Nk

∑i∈Nk

λi (3)

where λ1 ≤ . . . ≤ λN are the ordered eigenvalues of 1nYYH

gathered in successive clusters Nk = N −∑Ki=kNi +

1, . . . , N −∑Ki=k+1Ni. Indeed, the eigenvalues of 1

nYYH

indexed by Nk can be directly mapped to Pk (their limitingvalue in the large n regime).

However, for N and n of comparable sizes, the convergence(2) no longer holds. In this scenario, the spectrum of 1

nYYH

converges weakly to the union of a maximum of K com-pact clusters of eigenvalues concentrated somewhat aroundP1, . . . , PK , but whose centers of mass are not located atP1, . . . , PK . This is depicted in Figure 3, where it is clearlyseen that eigenvalues gather somewhat around the empiricalvalues of P1, P2, P3 in clusters. Therefore, as N,n→∞, theestimator P∞k is no longer (asymptotically) consistent. Notealso that, depending on the values of P1, . . . , PK (and in factalso on the ratio N/n), the number of clusters varies. Fromthe second plot of Figure 3, we already anticipate the well-known problem of order selection, i.e. determining the numberK of distinct input sources from an observation Y. For thesereasons, more advanced analyses of the spectrum of 1

nYYH

must be carried out, from which (N,n)-consistent alternativesto (3) can be derived, i.e. estimators which are consistent asboth N,n grow large. This introduction of a methodologicalapproach to (N,n)-consistent estimators, called G-estimators,is one of the objectives of this article.

III. LARGE DIMENSIONAL RANDOM MATRIX THEORY

In this section, we introduce the basic notions of large di-mensional random matrix theory and describe two elementarytools: the resolvent and the Stieltjes transform. These toolsare necessary to understand the stark difference between thelarge random matrix techniques and the conventional statisticalmethods.

Random matrix theory studies the properties of matriceswhose entries follow a joint probability distribution. Thetwo main axes deal with the exact statistical characteriza-tion of small-size matrices and the asymptotic behaviour oflarge dimensional matrices. For the analysis of small-sizematrices, one usually seeks closed form expressions of theexact statistics. However, these expressions become quicklyintractable for practical analysis as one departs from whiteGaussian matrices [11]–[13]. The mathematical methods usedare usually very standard, mostly based on standard tools fromprobability theory. In the context of large dimensional randommatrices, the tools are very different: asymptotic statistics andprobability are obviously fundamental, but also linear algebra,as well as real and complex analysis, and combinatorics.The focus of these techniques are either based on invarianceproperties of the underlying matrices (free probability theory[14], [15], combinatorics [16], [17], Gaussian integration bypart methods [18], [19]) or on independence properties oftheir entries [6], [20], [21]. In the following, we will mostlyconcentrate on tools for large dimensional random matrices

Page 4: Signal Processing in Large Systems: A New Paradigm

4

1 3 70

0.2

0.4

0.6

Eigenvalues

Den

sity

Empirical eigenvalue distributionLimit law (from Theorem 1)

1 3 40

0.2

0.4

0.6

Eigenvalues

Den

sity

Empirical eigenvalue distributionLimit law (from Theorem 1)

Fig. 3. Histogram of the eigenvalues of 1nYYH for N = 300, n = 3000,

with P diagonal composed of three evenly weighted masses in (i) 1, 3 and7 at the top, (ii) 1, 3, and 4 on the bottom.

with independent entries, although some results we presentoriginate from other techniques. The main tool of interest inthe spectral study of these matrices is the Stieltjes transform,which we introduce hereafter.

A. The Stieltjes transform

Similar to the celebrated Fourier transform in classicalprobability theory and signal processing which allows one toperform simpler analysis in the Fourier (or frequency) domainthan in the initial (or time) domain, the spectral analysis oflarge dimensional random matrices2 is often carried out withthe help of the Stieltjes transform.

Definition 1 (Stieltjes transform): Let F be a real probabil-ity distribution function and z ∈ C taken outside the supportS of F . Then the (Cauchy-)Stieltjes transform mF (z) of F atpoint z is defined as

mF (z) ,∫S

1

t− zdF (t).

2What one refers here to as spectrum is the eigenvalue distribution of theunderlying matrix.

Note importantly that, for X ∈ CN×N Hermitian witheigenvalues λ1, . . . , λN , and eigenvalue distribution FX, theStieltjes transform mX of X is, for z ∈ C \ λ1, . . . , λN,

mX(z) =

∫dFX(t)

t− z=

1

N

N∑k=1

1

λk − z=

1

Ntr(X− zIN )−1.

The Stieltjes transform, in the same way as the Fouriertransform, has an inverse formula, given by

F (b)− F (a) = limy↓0

∫ b

a

= [mF (x+ ıy)] dx. (4)

This formula is however rarely used in practice. What thissays for random matrices is that, if the Stieltjes transform ofa matrix X is known, then one can retrieve its eigenvaluedistribution, which is often more difficult to obtain in thespectral domain than in the Stieltjes transform domain.

Since we will constantly deal with matrices of the form(X−zIN )−1, called the resolvent matrix of X, the analysis ofrandom matrices is fundamentally based on the exploitation ofclassical matrix inversion formulas. In order to better capturethe essence of the Stieltjes transform approach, we hereafteroutline the core arguments of the proof of the Marcenko-Pasturlaw, as performed in the original article [6] back in 1967.

B. The Marcenko-Pastur law

We recall that we wish to determine the expression of thelimit of the eigenvalue distribution of YYH, given in Figure 2,as N,n → ∞, N/n → c, where Y = [y1, . . . ,yn] ∈ CN×nhas i.i.d. entries of zero mean and variance 1/n. In the Stieltjestransform domain, we therefore need to evaluate mYYH(z) =1N tr(YYH − zIN )−1.

The idea is to concentrate on the value of each individualdiagonal entry of the resolvent (YYH − zIN )−1. Becauseof its obvious symmetrical structure, it suffices to studythe asymptotic behaviour of the first diagonal entry. Writing(YYH − zIN )−1 as a block matrix with upper-left cornercomposed of the unique (1, 1) entry and applying the classicalSchur complement [22], it is easy to see that[

(YYH − zIN )−1]11

=1

−z − zyH1 (YHY − zIn)−1y1

(5)

where we have defined Y and y1 such that YH = [y1 YH].Due to the independence between y1 and Y and the factthat y1 has i.i.d. entries, the quantity yH

1 (YHY − zIn)−1y1

approaches 1n tr(YHY−zIn)−1 as the system sizes N,n grow

large. Indeed, this is clearly true in the expectation over y1

(with equality) and, conditioned on Y, the convergence some-what recalls a law of large numbers. The interesting part is tonotice that the latter trace expression is tightly connected to1N tr(YYH−zIN )−1, as the only major difference between thetwo lies in a single column change (a rank-one perturbation)which asymptotically will not alter the limiting normalizedtrace. A complete exposition of this derivation can be foundin many random matrix books, e.g. [23]–[25].

Since this remark is valid for all diagonal entries of theresolvent (YYH−zIN )−1, it holds also true for their average1N tr(YYH−zIN )−1, i.e. for the Stieltjes transform mYYH(z)

Page 5: Signal Processing in Large Systems: A New Paradigm

5

of YYH. Recalling (5), the Stieltjes transform can therefore beasymptotically well approximated as a function of itself. Thisnaturally leads mYYH(z) to be approximately the solution ofa fixed-point equation (or implicit equation). In many randommatrix models, this is as far as the Stieltjes transform wouldgo, and we would conclude that the limiting spectrum of thematrix under study is defined through its Stieltjes transform,whose expression is only known through a fixed-point equation(with usually a unique solution3).

In the case of the Marcenko-Pastur law, this fixed-pointequation reads precisely

mYYH(z) ' 1

1− c− z − zcmYYH(z)(6)

which turns out to be equivalent to a second order polynomialin mYYH(z) which can be solved explicitly. Using the inverseStieltjes transform formula (4), we then obtain the limitingspectrum of YYH with density f(x) given by

f(x) = (1− c−1)+δ(x) +1

2πcx

√(x− a)(b− x) (7)

with a = (1 −√c)2, b = (1 +

√c)2, and we recall that c is

the limiting ratio limN/n, where the square-root appearing inthe expression originates from the roots of the second orderpolynomial defining mYYH(z).

It is interesting to see that this expression is only defined fora ≤ x ≤ b and possibly x = 0, which defines the support ofthe distribution. In particular, we confirm, as already discussedthat, as c→ 0, [a, b] tends to a singleton in 1. The convergenceto 1 is however quite slow as it is of order

√c for c small

(since (1 +√c)2 ' 1 + 2

√c in this regime). This suggests

that n must be much larger than N for the classical large-napproximation to be acceptable. Typically, in array processingwith 100 antennas, no less than 10 000 signal observationsmust be available for subspace methods relying on the large-napproximation to be valid with an error of order 1/10 on theassumed eigenvalues of YYH. The resulting error may alreadylead to quite large fluctuations of the classical estimators.

This observation strongly suggests that the classical tech-niques which assume n N must be revisited, as will bedetailed in Section III-D. In the next section, we go a stepfurther and introduce a generalization of the Marcenko-Pasturlaw for more structured random matrices.

C. Spectrum of large matrices

As should have become clear from the discussion above,a central quantity of interest in many signal processing ap-plications is the sample covariance matrix of n realizationsY = [y1, . . . ,yn] of a process yt = Txt with xt having i.i.d.zero mean and unit variance entries. The realizations may beindependent, in which case Y = TX, X = [x1, . . . ,xn] withi.i.d. entries, or have a (linear) time dependence structure, inwhich case yt is often modelled as an autoregressive moving-average (ARMA) process; if so, we can write Y = TZRwith Z a random matrix with i.i.d. zero mean entries and R

3For z ∈ R−, the fixed-point map is usually a standard interferencefunction, as described in [26], allowing for simple algorithmic methods tosolve the fixed-point in practice.

a deterministic time correlation matrix (generally Toeplitz).The limiting eigenvalue distribution of many of such randommatrix structures have been studied in the past ten years andhave led to many fundamental results which only recently havefound their way to signal processing applications.

In the following, we treat the simplest case of a samplecovariance matrix based on n independent observations of theprocess yt described above, when xt has i.i.d. entries withzero mean. The outline of the techniques below is in generalthe same when it comes to more complicated models.

We start with the important generalization of the Marcenko-Pastur law to matrices with left-sided correlation, whosegeneral proof is due to Bai and Silverstein [20].

Theorem 1: Consider the matrix

T = T12 XXHT

12 =

n∑i=1

T12 xix

Hi T

12

where X = [x1, . . . ,xn] ∈ CN×n has i.i.d. entries withzero mean and variance 1/n, and T ∈ CN×N is nonnegativeHermitian whose eigenvalue distribution FT converges weaklyto FT as N → ∞. Then, the eigenvalue distribution F T ofT converges weakly and almost surely to the distribution Fwith Stieltjes transform mF (z) = cmF (z) + (c− 1) 1

z , wheremF (z) is a solution to

mF (z) = −(z − c

∫t

1 + tmF (z)dFT (t)

)−1(8)

for all z ∈ C+ , z ∈ C, =[z] > 0.

Note here that, contrary to the case treated previously whereT = IN , the limiting eigenvalue distribution of T does nottake an explicit form, but is only defined implicitly through itsStieltjes transform which satisfies a fixed-point equation. Thisresult is nonetheless usually sufficient to derive explicit detec-tion tests and estimators for our application needs. Althoughnot mentioned for readability in the statement of the theorem,it is usually true that the fixed-point equation has a uniquesolution on some restricted set, and that classical fixed-pointalgorithms do converge surely to the proper solution; this isparticularly convenient for numerical evaluations.

An important remark for the subsequent discussion is thatthe integral in (8) can be easily related to the Stieltjestransform of mT of FT to give the equivalent representation

mF (z) = −(z − c

mF (z)

[1− 1

mF (z)mT

(− 1

mF (z)

)])−1.

(9)

Although the original formula of [20] is given by (8), Equation(9) is more interesting in several aspects. The importantobservation in (9) is that the Stieltjes transform mT can berelated to the limiting Stieltjes transform mF of T through thissimple equation. This is the fundamental anchor for inferencetechniques, where the observable data (T) get connected tothe hidden parameters (information on T) to be estimated.

Indeed, in the sample covariance matrix model, eigen-inference methods used in subspace estimators consist inretrieving information about the population covariance matrix

Page 6: Signal Processing in Large Systems: A New Paradigm

6

T from the sample covariance matrix T. Therefore, if one canrelate the information to be estimated in T to mT (which isasymptotically mT ), then (9) provides the fundamental linkbetween T and T through their respective limiting Stieltjestransform, from which estimators can be obtained. This pro-cedure is detailed in the next section.

Note that Theorem 1 provides the value of mF (z) for allz ∈ C+. Therefore, if one desires to numerically evaluate thelimiting eigenvalue distribution of T, it suffices to evaluatemF (z) for z = x+ ıε, with ε > 0 small and for all x > 0 andthen use the inverse Stieltjes transform formula (4) to describethe limiting spectrum density f by

f(x) ' 1

π= [mF (x+ ıε)] .

We used this technique to produce Figure 3 in which T =UPUH was assumed composed of three evenly weightedmasses in 1, 3, 7 (top) or 1, 3, 4 (bottom).

As mentioned above, it is possible to derive the limitingeigenvalue distribution of a wide range of random matrixmodels or, if a limit does not exist for the model (this mayarise for instance if the eigenvalues of T are not assumedto converge in law), to derive deterministic equivalents forthe large dimensional matrices. For the latter, the eigenvaluedistribution F T of T can be approximated for each N by adeterministic distribution FN such that F T−FN ⇒ 0 almostsurely. These distribution functions FN usually have a verypractical form for analysis; see e.g. [27]–[29] for examplesof random matrix models that do not admit limits but onlydeterministic equivalents.

The knowledge of the limiting eigenvalue distribution ofsuch matrices is not of particular interest to signal processing,but it is a required prior step before performing statisticalinference and detection methods. In wireless communicationproblems however, such as that of evaluating the capacity

C =1

Nlog det

(IN +

1

σ2HHH

)=

∫ ∞σ2

(1

t− 1

Ntr(HHH + tIN )−1

)dt

of a multi-antenna Gaussian channel H with noise varianceσ2, these tools alone are fundamental, see e.g. [28]–[32].In particular, we recognize in the above formula that theStieltjes transform 1

N tr(HHH−zIN )−1 of HHH, and not itseigenvalue distribution, is the quantity of central interest. Thisis probably the main explanation why wireless communicationapplications of random matrix results have appeared as earlyas in 1999 [33], [34], while applications for signal processingmethods emerged only very recently.

In the following, we pursue our study of the spectral prop-erties of large dimensional random matrices to applicationsin statistical inference and introduce the eigen-inference (orG-estimation) method.

D. G-estimation

As introduced in the previous section, one of the majorobjectives of random matrix theory for signal processing is toimprove the statistical tests and inference techniques derived

under the assumption of infinitely many observations forscenarios where the system population size N and the numberof observations n are of the same order of magnitude. Forthis, we will rely on the connections between the Stieltjestransforms of the population and sample covariance matrices.

The G-estimation method, originally due to Girko [8],intends to provide such (N,n)-consistent estimators. Thegeneral technique consists in a three-step approach along thefollowing lines. First, the parameter to be estimated, call itθ, is written under the form of a functional of the Stieltjestransform of the deterministic matrices of the model, say hereθ = f(mT) for a population matrix T with Stieltjes transformmT. Then, mT is connected to the Stieltjes transform mT

of an observed matrix T through a formula similar to thatdescribed earlier in (9), e.g. mT = g(mF ) for a certainfunction g. Finally, connecting the pieces together, we havea link between θ and the observations T, this link being onlyexact asymptotically. For finite N,n dimensions, this natu-rally produces an (asymptotically) (N,n)-consistent estimatorθ = g(mT).

While the fundamental tool for connecting population andobservation spaces is the Stieltjes transform, the second im-portant tool, that will help connecting the parameter θ to theStieltjes transform mT, is the Cauchy integral formula andcomplex analysis in general.

For better understanding, we hereafter elaborate on aconcrete example to describe the eigen-inference frameworkwithin the context of the estimation of the source powersdiscussed in Section II-B.

We recall that one observes a matrix of the type Y =UP

12 X, where U ∈ CN×N is unitary, X ∈ CN×n is filled

with i.i.d. entries with zero mean and unit variance and P isdiagonal with N1 entries equal to P1, . . . , and NK entriesequal to PK , all values of Pi being distinct.

While the objective in the previous section was to charac-terize the asymptotic eigenvalue distribution of 1

nYYH whenP is known, the target of this section is instead to inferP1, . . . , PK from the observation Y. For this, we make thefollowing fundamental observation. From Cauchy’s complexintegration formula [7],

Pk = − 1

2πı

∮Ck

z

Pk − zdz

for a complex positively oriented contour Ck circling oncearound Pk. If Ck does not enclose any of the other Pi, i 6= k,then it is also true, again from Cauchy’s integration formula,that

Pk = − 1

2πı

N

Nk

∮Ck

z1

N

K∑i=0

Ni1

Pi − zdz

= − 1

2πı

N

Nk

∮Ck

zmP(z)dz. (10)

Hence, we are able to write Pk as a function of the Stieltjestransform of the non-observable matrix P. The next step isto link mP to the Stieltjes transform of 1

nYYH. For this,we use (9) with T = UPUH (which has the same eigenvaluespectrum as P) and F the limiting spectrum of 1

NYHY, which

Page 7: Signal Processing in Large Systems: A New Paradigm

7

after the variable change z = mF (u) leads to the elementaryexpression

Pk =1

c

1

2πı

∮C′k

um′F (u)

mF (u)du

for some contour C′k which happens to circle exactly aroundthe cluster of eigenvalues of T associated to Pk only, andwith m′F the complex derivative of mF (z) with respect toz. It is therefore fundamental that there exists a contour thatcircles around the cluster associated to Pk only. It may happen,depending on the ratio c and the values of the Pk, thatthe limiting spectrum of T generates less than K clusters,some clusters being associated with multiple power values, asdepicted in the second graph of Figure 3. In this scenario,the above result does not hold and the technique collapses.In signal processing applications, this effect is linked to theproblem of source separation. From now on, we thereforeassume that the cluster of eigenvalues associated to Pk isperfectly isolated. Necessary and sufficient conditions for thisto hold are clearly established [57].

Under the cluster separability assumption, an estimator Pkfor Pk is now straightforward to obtain. Indeed, replacing mF

by its empirical estimate m 1N YHY, Pk is approximately equal

to a complex integral whose integrand is constituted of rationalfunctions. A final step of residue calculus [7] completes thecalculus and we obtain explicitly the estimator

Pk =n

Nk

∑m∈Nk

(λm − µm) (11)

where Nk is defined as in (3), λ1 ≤ . . . ≤ λN are theeigenvalues of 1

nYYH, and µ1 ≤ . . . ≤ µN are the ordered

eigenvalues of diag(λ)− 1n

√λ√λT

, with λ = (λ1, . . . , λN )T.This estimator therefore improves the n-consistent estimator(3) to an (N,n)-consistent estimator which surprisingly turnsout to have much better performance statistics for all couples(N,n). Performance figures will be provided in Section IVfor signal processing scenarios of more practical interest.

Multiple estimators can be derived similarly for differenttypes of models and problems, a large number of which havebeen investigated by Girko and named the G-estimators. A listof more than fifty estimators for various applications can befound in [35]. In Section IV, some examples in the context ofpath loss estimation in wireless communications and angle ofarrival estimation for array processing will be discussed.

We now turn to a finer analysis of random matrices whichis no longer concerned with the weak limit of eigenvaluedistributions but rather with the limiting fluctuations of in-dividual eigenvalues (in particular the extreme eigenvalues).This problem is at the core of some standard statistical testsfor signal detection, such as the generalized likelihood ratiotest (GLRT).

E. Extreme eigenvalues

It is important to note at this point that Theorem 1 and alltheorems deriving from the Stieltjes transform method onlyprovide the weak convergence of the empirical eigenvaluedistribution to a limit F but do not say anything about the

behaviour of the individual eigenvalues. In particular, thefact that the support of F is compact does not mean thatthe empirical eigenvalues will asymptotically all lie in thiscompact support. Indeed, consider for instance the distributionFN (x) = N−1

N 1x≤1 + 1N 1x≤2. Then the largest value taken

by a random variable distributed as FN equals 2 for all Nalthough it is clear that FN converges weakly to F = 1x≤1(and then the support of F is the singleton 1). In the contextof Theorem 1, this reasoning indicates that, although the weaklimit of F T is compactly supported, some or even manyeigenvalues of F T may still be found outside the support.

As a matter of fact, it is proved in [36] that, if the entriesof X in Theorem 1 have infinite fourth order moment, thenthe largest eigenvalue of XXH tends to infinity. For non-degenerate scenarios though, i.e. when the fourth moment ofthe entries of X is finite, we have the following much expectedresult [37], [38].

Theorem 2 (No eigenvalue outside the support): Let T bedefined as in Theorem 1 with X having entries with finitefourth order moment and with FT ⇒ FT , such that T has noeigenvalue outside the support of FT . We recall that F T ⇒ Ffor F defined in Theorem 1. Take now [a, b] ⊂ R ∪ ±∞strictly away from the support of F . Then, almost surely, thereis no eigenvalue of T found in [a, b] for all large N .

This result says in particular that, for T = IN , and for alllarge N , there is no eigenvalue of T found away from thesupport of the Marcenko-Pastur law. This further generalizesto the scenario where T has a few distinct eigenvalues, eachwith large multiplicities, in which case F is formed of multiplecompact clusters, as depicted in Figure 3. In this context, it isproved in [39] that the number of eigenvalues asymptoticallyfound in each cluster matches exactly the multiplicity ofthe corresponding mass in T (assuming cluster separability).This is often referred to as the property of exact spectrumseparation. Note that this point fully justifies the last step inthe derivation of the estimator (11) where we somewhat hidthe fact that the residue calculus naturally imposes to knowwhere the eigenvalues of T are precisely located. As we willsee in Section IV, these results on the asymptotic location ofthe eigenvalues allow us to derive new hypothesis tests for thedetection of signals embedded in white noise. In particular, wewill present tests on the received sample covariance matrix Tthat confront a pure white noise against a signal-plus-noisehypotheses, i.e. T = IN against T 6= IN .

Based on the earlier reasoning on the weak limit inter-pretations, we insist that, while Theorem 1 does state thatFT ⇒ 1x≤1 implies that F is the Marcenko-Pastur law, itdoes not state that FT ⇒ 1x≤1 implies that no eigenvalue isasymptotically found away from the support of the Marcenko-Pastur law. Indeed, if T = diag(1, . . . , 1, a), with a 6= 1,the theorem cannot be applied. Conversely, for T describedabove, it is not clear whether an eigenvalue of T will beseen outside the limiting support. This interrogation is offundamental importance for the performance evaluation ofasymptotic detection tests and has triggered a recent interestfor this particular model for T, called the spike model, whichwill be discussed next.

Page 8: Signal Processing in Large Systems: A New Paradigm

8

F. Spike models

It is of particular interest for signal processing applicationsto model low rank signal spaces embedded in white noise.This is the case in array processing where a single signalsource with unique propagation path is received during severalconsecutive time instants, hence producing a rank-one signalmatrix to be added to the ambient full rank white noise. Theobjective of this particular model is to derive simple detectionand identification procedures that do not fall in the ratherinvolved scheme of Section III-D.

Under the generic denomination of “spike models” weunderstand all random matrix models showing small rankperturbations of some classical random matrix models. Forinstance, for X ∈ CN×n with i.i.d. entries with zero meanand variance 1/n, the matrices T = (X + E)(X + E)H,E ∈ CN×n such that EEH is of small rank K, or T =(IN + E)

12 XXH(IN + E)

12 , E ∈ CN×N of small rank K,

fall into the generic scheme of spike models. For all of thesemodels, Theorem 1 and similar results, see e.g. [40], claim thatF T converges weakly to the Marcenko-Pastur law, which isthe major motivation of these models as they are rather simpleto analyze. The interest here though is to study the behaviourof the extreme eigenvalues of T.

To carry on with the same models as before, we concentratehere on T modeled as in Theorem 1, with T taken tobe a small rank perturbation of the identity matrix. Similarproperties can be found for other models e.g. in [41], [42].The first result deals with first order limits of eigenvalues andeigenvector projections of the spike model [41], [43], [44].

Theorem 3: Let T be defined as in Theorem 1, X have i.i.d.entries with zero mean and variance 1/n, and T = IN+E withE =

∑Ki=1 ωiuiu

Hi its spectral decomposition (i.e. uH

i uj =δji ), where K is fixed and ω1 > · · · > ωK > 1. Denoteλ1 ≥ . . . ≥ λN the ordered eigenvalues of T and ui ∈ CNthe eigenvector associated with the eigenvalue λi. Then, for1 ≤ i ≤ K, as N,n→∞ with N/n→ c,

λia.s.−→

(1 +

√c)2 , if ωi ≤

√c

ρi , 1 + ωi + c 1+ωi

ωi, if ωi >

√c

and

|uHi ui|

a.s.−→

0 , if ωi ≤

√c

ξi ,1−cω−2

i

1+cω−1i

, if ωi >√c.

The main observation of this result is that, if ω1 ≤√c,

then the largest eigenvalue of T asymptotically converges tothe right edge of the Marcenko-Pastur law and is thereforehidden in the main bulk of the eigenvalues, while if ω1 >

√c,

the largest eigenvalue of T is found outside the main cluster.This has fundamental consequences. When T is the sam-ple covariance matrix of signal-plus-noise data with signalstrength ω1, then, if the signal is strong enough or converselyif N/n is small enough, it is possible to detect this signalbased on the presence of an eigenvalue exterior to the maineigenvalue cluster of T. Otherwise, neither the eigenvaluesnor the eigenvectors of T provide any information on thepresence of a signal, at least asymptotically. As a conclusion,

ρ10

0.2

0.4

0.6

0.8

1

1.2

Eigenvalues

Den

sity

Marcenko–Pastur law, c = 5/4

Empirical eigenvalues

Fig. 4. Eigenvalues of T = T12 XXHT

12 , where T is a diagonal of ones

but for the first four entries set to 3, 3, 2, 2, N = 500, n = 400. Thetheoretical limiting eigenvalue of T is emphasized.

in the latter scenario, there is no way to decide if a signalis indeed present or not. This provides fundamental limits ofsignal detection tests. This subject is further discussed in theapplication Section IV.

In Figure 4, we depict the eigenvalues of a spike model as inTheorem 3, with four population eigenvalues greater than one.Two of them exceed the detectability threshold ωi >

√c, while

the other two do not. As expected, only two eigenvalues arevisible outside the bulk, with values close to their theoreticallimit.

Although this was not exactly the approach followed ini-tially in [43], the same Stieltjes transform and complex integra-tion framework can be used to prove this result. We hereaftergive a short description of this proof in the simpler case whereE = ωuuH and ω >

√c.

1) Extreme eigenvalues: By definition of an eigenvalue,det(T− zIN ) = 0 for z an eigenvalue of T. Some algebraicmanipulation based on determinant product formulas andWoodbury’s identity [22] lead to

det(T− zIN ) = fN (z) det(IN + E) det(XXH − zIN )

with fN (z) = 1 + zω

1 + ωuH(XXH − zIN )−1u.

An eigenvalue of T not inside the main cluster cannot cancelthe right-hand determinant in the first line and must thereforecancel fN (z). Standard random matrix lemmas then show that,since uH(XXH− zIN )−1u is asymptotically close to mF (z),the Stieltjes transform of the Marcenko-Pastur law,

fN (z)a.s.−→ f(z) , 1 + z

ω

1 + ωmF (z).

Substituting mF (z) by its exact formulation (that is, theStieltjes transform of the Marcenko-Pastur law), we thenobtain that f(z) = 0 is equivalent to z = 1 + ω + c 1+ωω ifω >√c and has no solution otherwise, which is the expected

result.

Page 9: Signal Processing in Large Systems: A New Paradigm

9

2) Eigenvector projections: For the result on eigenvectorprojections, we use again the Cauchy integration schemewhich states that, for any deterministic vectors a,b ∈ CNand for all large N ,

aHu1uH1 b = − 1

2πı

∮C

aH(T− zIN )−1b (12)

where C is a positively oriented contour circling around ωand excluding 1, and where we again recognize a quadraticform involving the resolvent of T. This is an immediateconsequence of the fact that (T−zIN )−1 has a pole in λ1 withresidue −u1u

H1 . We then use the same matrix manipulations as

in the previous section (by isolating the term XXH) to finallyobtain the result of Theorem 3 after residue calculus.

These first order results are important since they suggestdetection and identification tests in signal processing scenarioswith extremely large arrays. However, for realistic smallsystem sizes, these results are not sufficient to provide efficientstatistical tests. To overcome this limitation, we need to gobeyond the first order limits and characterize the fluctuationsof the extreme eigenvalues and eigenvector projections. Recentworks have provided such statistics for the results of Theo-rem 3 as well as for the fluctuations of the estimator (11),which we introduce presently.

G. Fluctuations

We first mention that the fluctuations of functionals of F T−F in Theorem 1 have been derived in [45]. More precisely,for any well-behaved function f (at least holomorphic on thesupport of F ) and under some mild technical assumptions,

N

∫f(t)

(dF T(t)− dF (t)

)⇒ N(0, σ2)

with σ2 known. This in particular applies to f(t) = (t− z)−1for z ∈ C \ R+, which gives the asymptotic fluctuations ofthe Stieltjes transform of T. From there, classical asymptoticstatistics tools such as the delta-method [1] allow one totransfer the Gaussian fluctuations of the Stieltjes transformmT to the fluctuations of any function of it, e.g. estimatorsbased on mT. The following result on the fluctuations of theestimator (11) therefore comes with no surprise [46].

Theorem 4: Consider the estimator Pk in (11). Assumingthe entries of X have finite fourth order moment (to ensureTheorem 2) and that Pk generates an isolated cluster,

N(Pk − Pk)⇒ N(0, σ2

)where σ2 is evaluated precisely in [46] as a complex integralform involving derivatives of the limiting Stieltjes transformof T.

In the case of the spike model, it is of interest to derive thefluctuations of the extreme eigenvalues and eigenvector pro-jections, whose limits were given in Theorem 3. Surprisingly,the fluctuations of these variables are not always Gaussian,[44], [47], [48].

Theorem 5: Let T be defined as in Theorem 3. Then, for1 ≤ k ≤ K, if ωk <

√c,

N23λk − (1 +

√c)2

(1 +√c)

43√c⇒ T2

where T2 is known as the complex Tracy-Widom distribution,described in [49] as the solution of a Painleve differentialequation. If, on the other hand, ωk >

√c, then

√N

(|uHk uk|2 − ξkλk − ρk

)⇒ N (0,Σk)

where the entries of the 2× 2 matrix Σk can be expressed ina simple closed form as a function of ωk only. Moreover, fork 6= k′ such that ωk, ω′k >

√c and distinct, the centered-scaled

eigenvalues and eigenvector projections are asymptoticallyindependent.

The second result of Theorem 5 can be obtained using thesame Stieltjes transform and complex integration frameworkas described in the sketch of proof of Theorem 3, see [44] fordetails. Again in that case, these are mainly the fluctuationsof the Stieltjes transform of XXH and an application of thedelta-method which lead to the result. The first result ofTheorem 5 is proved with very different methods which we donot introduce here (since the extreme eigenvalues are inside thebase support, complex contour integration approaches cannotbe performed). These methods involve techniques such asorthogonal polynomials and Fredholm determinants which relyon the study of the finite dimensional distribution of theeigenvalues of XXH before ultimately providing asymptoticresults, see [24], [50] for details.

The Tracy-Widom distribution is depicted in Figure 5. Itis interesting to see that the Tracy-Widom law is centered ona negative value and that the probability for positive valuesis low. This suggests that, if the largest eigenvalue of T isgreater than (1 +

√c)2, then it is very likely that T is not

the identity matrix. A similar remark can be made for thesmallest eigenvalue of T which has a mirrored Tracy-Widomfluctuation [51], when c < 1.

Theorem 5 allows for asymptotically accurate test statis-tics for the decision on signal-plus-noise against pure noisemodels, thanks to the result on the extreme eigenvalues.Moreover, the results on the eigenvector projections provideeven more information to the observer. In the specific contextof failure diagnosis [44], introduced in Section IV, this canbe used for the identification of the (asymptotically) mostlikely assumption from a set of M failure models of the type(IN + Ek)

12 XXH(IN + Ek)

12 , for k ∈ 1, . . . ,M.

H. Practical limitations

For all techniques derived above, it is important to keepin mind that the results are only valid under the assumptionof large matrix dimensions. As it turns out from convergencespeed considerations, estimators based on weak convergenceproperties of the eigenvalue distribution are accurate even forvery small system sizes (N = 8 is often sufficient, if not less),while second order statistics of functionals of the eigenvaluedistribution roughly require N to be at least of order 64. When

Page 10: Signal Processing in Large Systems: A New Paradigm

10

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

Centered-scaled largest eigenvalue of XXH

Den

sity

Empirical EigenvaluesTracy-Widom law T2

Fig. 5. Distribution of N23 c−

12 (1+

√c)−

43[λ1 − (1 +

√c)2]

against theTracy-Widom law for N = 500, n = 1500, c = 1/3, for the covariancematrix model XXH, X ∈ CN×n with Gaussian i.i.d. entries. Empiricaldistribution from 10, 000 Monte-Carlo simulations.

it comes to asymptotic properties of the extreme eigenvalues,due to the loss of the collective averaging effect of all theeigenvalues, the convergence speed is much slower so thatN = 16 is often a minimum for first order convergence,while N of order 128 is required for asymptotic second orderstatistics to become accurate. Note also that these empiricalvalues assume “non-degenerated” conditions; for instance, forN = 128, the second order statistics of the largest eigenvaluein a spike model with population eigenvalue ω =

√c + ε

for a small ε > 0 are often far from Gaussian with zeromean (since they should be simultaneously close to follow thevery negatively-centered Tracy-Widom distribution) so that,depending on how small ε is chosen, much larger N areneeded for an accurate Gaussian approximation to arise. Thissuggests that all the above methods have to be manipulatedwith extreme care depending on the application at hand.

The results introduced so far are only a small subset of theimportant contributions resulting from more than ten years ofapplied random matrix theory. A large exposition of sketchesof proofs, main strategies to address applied random matrixproblems, as well as a large amount of applications areanalyzed in the book [25] in the joint context of wirelesscommunications and signal processing. An extension of somenotions introduced in the present tutorial can also be found in[52]. Deeper random matrix consideration about the Stieltjestransform approach can be found in [23] and referencestherein, while discussions on extreme eigenvalues and the toolsneeded to derive the Tracy-Widom fluctuations, based on thetheory of orthogonal polynomials and Fredholm determinants,can be found in e.g. [24], [50], [53] and references therein.In the next section, we discuss several application examples,already partly introduced above, which apply Theorems 1-5 invarious contexts of hypothesis testing and subspace statisticalinference.

IV. CASE STUDIES

In this section, several classical detection and estimationschemes are revisited for the regime N ' n studied inSection III.

A. Multi-source power inference in i.i.d. channels

Our first case study originates from the field of wirelesscommunication, closely related to the previous example ofeigenvalue inference. We consider a cognitive sensing scenario[54] in which an N -antenna sensing device (a mobile terminalfor instance) collects data from K multi-antenna transmitters.Transmitter k is equipped with Mk antennas and sends thesignals xk(t) ∈ CMk to the receiver at time t through thechannel Hk ∈ CN×Mk , containing i.i.d. Gaussian entries withzero mean and variance 1/N .4 We assume M1 + . . .+MK ,M ≤ N and that H , [H1, . . . ,HK ] remains static duringat least n symbol periods. The receiver also captures whiteGaussian noise σw(t) with zero mean and variance σ2. Thereceive signal y(t) ∈ CN at time t is then modelled as

y(t) =

K∑i=1

√P iHixi(t) + σw(t).

Gathering n realizations of y(t) ∈ CN in the matrix Y =[y(1), . . . ,y(n)], Y takes the form

Y =(HP

12 σIN

)(XW

)where X = [x(1), . . . ,x(n)], x(t) = [x1(t)T, . . . ,xK(t)]T,P = diag(P1IM1

, . . . , PKIMK), and W = [w(1), . . . ,w(n)].

We wish to estimate the power values P1, . . . , PK from theobservation Y. We see that the sample covariance matrixapproach developed in Section III-D is no longer valid as thepopulation matrix T , HPHH + σ2IN is now random.

Following the general methodology developed in Sec-tion III-D, assuming the asymptotic cluster separation propertyfor a certain Pk, we first use the connection between Pk andmP (z), the limiting Stieltjes transform of P, given by (10).The innovation from the study in Section III-D is that thelink between mP and mF , the limiting Stieltjes transform of1NYHY, is more involved and requires some more randommatrix arguments than Theorem 1 alone. This link, along witha generalization of Theorem 2 to account for the randomnessin T, induces the following (N,n)-consistent estimator Pk ofPk [44]

Pk =Nn

Mk(n−N)

∑i∈Nk

(ηi − µi)

where Nk is the set of indexes of the eigenvalues of 1nYYH

associated to Pk, η1 < . . . < ηN are the eigenvalues ofdiag(λ)− 1

N

√λ√λT

and µ1 < . . . < µN are the eigenvalues

of diag(λ)− 1n

√λ√λT

, where λ1 < . . . < λN are the orderedeigenvalues of 1

nYYH.In Figure 6, the performance comparison between the clas-

sical large n approach and the Stieltjes transform approach for

4The normalization of the channel entries by 1/N along with ‖xk(t)‖bounded allows one to keep the transmit signal power bounded as N grows.

Page 11: Signal Processing in Large Systems: A New Paradigm

11

−5 0 5 10 15 20 25 30−20

−15

−10

−5

0

SNR [dB] (σ−2)

Nor

mal

ized

mea

nsq

uare

erro

r[d

B]

P3

P∞3

Fig. 6. Normalized mean square error of largest estimated power P3 forP1 = 1/16, P2 = 1/4, P3 = 1, M1 = M2 = M3 = 4, M = 12,N = 24, n = 128. Comparison between conventional and Stieltjes transformapproaches.

power estimation is depicted, based on 10, 000 Monte Carlosimulations. Similar to the estimator (3), the classical approachof Pk, denoted P∞k , consists in assuming n N and N Mi for each i (recall that Mi is the multiplicity of Pi). A clearperformance gain is observed for the whole range of signal-to-noise ratios (SNR), defined as σ−2, although a saturationlevel is observed which translates the approximation error dueto the asymptotic analysis. For low SNR, an avalanche effectis observed, which is caused by the inaccuracy of the Stieltjestransform approach when clusters tend to merge (here thecluster associated to the value σ2 grows large as σ2 increasesand covers the clusters associated to P1, then P2, and finallyP3). Nonetheless, this apparently strong cluster separabilitylimitation generates an avalanche below the SNR level of thewell-known avalanche effect produced by the classical largen estimator. The random matrix method therefore yields atwofold performance gain compared to the classical method,as it is both more accurate (which was expected) and is alsomore robust to noise which is an interesting, not necessarilyanticipated, outcome.

Our next example is based on a contribution from Mestre[2] which initiated a series of statistical inference methodsexploiting complex integration, among which the source de-tection methods described above.

B. G-MUSIC

The MUSIC method, originally due to Schmidt and laterextended by McCloud and Scharf [55], [56], is a directionof arrival detection technique based on a subspace approach.The scenario consists of an array of N sensors (e.g. a radarantenna) receiving the signals originating from K sources atangles θ1, . . . , θK with respect to the array. The objective isto estimate those angles. The data vector y(t) received at time

t by the array can be written as

y(t) =

K∑k=1

s(θk)xk(t) + σw(t)

where s(θ) ∈ CN is a deterministic vector-valued functionof the angle θ ∈ [0, 2π), xk(t) ∈ C is the proper complexGaussian i.i.d. data sent by source k at time t and w(t) ∈ CNis the additive proper complex Gaussian noise. The vector y(t)is Gaussian with covariance

T = S(Θ)S(Θ)H + σ2IN

where S(Θ) = [s(θ1), . . . , s(θK)] ∈ CN×K . We denote asusual Y = [y(1), . . . ,y(n)] ∈ CN×n the concatenated matrixof n independent observations.

We call ω1 ≤ . . . ≤ ωN the eigenvalues of T andu1, . . . ,uN their associated eigenvectors. Similarly, we willdenote λ1 ≤ . . . ≤ λN the eigenvalues of T , 1

nYYH,with respective eigenvectors u1, . . . , uN . Assuming N ≥ K,the smallest N − K eigenvalues of T equal σ2 and we canrepresent T under the form

T =(UW US

)(σ2IN−K 00 ΩS

)(UHW

UHS

)ΩS = diag(ωN−K+1, . . . , ωN ), US = [uN−K+1, . . . ,uN ]the so-called signal space and UW = [u1, . . . ,uN−K ] theso-called noise space.

The basic idea of the MUSIC method is to observe thatany vector lying in the signal space is orthogonal to the noisespace. This leads in particular to

η(θk) , s(θk)HUWUHW s(θk) = 0

for k ∈ 1, . . . ,K.A natural estimator θk of θk in the neighborhood of θk

consists in the minimal argument of

η(θ) , s(θ)HUW UHW s(θ)

where UW = [u1, . . . , uN−K ] is the eigenvector space corre-sponding to the smallest N −K eigenvalues of T.

This estimator is however proven inconsistent for N,ngrowing large simultaneously in [57], which is now notsurprising. To produce an (N,n)-consistent estimator, recall(12). Similar to the derivation of the eigenvector projection inSection III-F, we can write

s(θ)HUWUHW s(θ) =

1

2πı

∮Ck

s(θ)H(T− zIN )−1s(θ)dz

for a contour Ck circling around σ2 only. Connecting T to Tfrom a theorem similar to Theorem 1 and performing residuecalculus, we finally obtain a good approximation of η(θ), andthen an (N,n)-consistent estimator for the direction of arrival.This estimator consists precisely in determining the K deepestminima (zeros may not exist) of the function [2]

s(θ)H

(N∑i=1

φ(i)uiuHi

)s(θ)

Page 12: Signal Processing in Large Systems: A New Paradigm

12

35 37

−30

−28

−26

−24

−22

−20

−18

−16

Angle [deg]

Cos

tfu

nctio

n[d

B]

MUSICG-MUSIC

Fig. 7. MUSIC against G-MUSIC for DoA detection of K = 2 signalsources, N = 20 sensors, n = 150 samples, SNR of 10 dB. Angles ofarrival of 35 and 37.

with φ(i) defined as

φ(i) =

1 +∑Nk=N−K+1

[λk

λi−λk− µk

λi−µk

], i ≤ N −K

−∑N−Kk=1

[λk

λi−λk− µk

λi−µk

], i > N −K

and where, similar to above, µ1 ≤ . . . ≤ µN are theeigenvalues of diag(λ)− 1

n

√λ√λT

, with λ = (λ1, . . . , λN )T,λ1 ≤ . . . ≤ λN the ordered eigenvalues of T.

Observe that, contrary to the classical MUSIC method, notonly the noise subspace but all the eigenvectors of T are used,although they are divided in two subsets with different weightpolicies applied to the eigenvectors in each set. This result isgeneralized to the scenario where xk(t) is non random. In thiscase, the technique involves different reference results than thesample covariance matrix theorems required here but the finalestimator is obtained similarly [58].

In Figure 7, the performance comparison between the tradi-tional MUSIC and the G-MUSIC algorithms is depicted, whentwo sources have close angles which the traditional MUSICalgorithm is not able to distinguish. It is clear on this single-shot realization that the G-MUSIC algorithm shows deeperminima and allows for a better source resolution.

Note that both inference methods described above, be itthe power or angle of arrival estimation schemes, assume apriori knowledge of the cluster indexes to be able to implementthe estimators. The proposed random matrix method thereforedoes not solve the order selection problem. New techniques aretherefore being investigated that deal with the generalization ofthe Akaike principle [59] and the minimum description lengthtechnique [60] for improved order selection using randommatrix theory, see e.g. [61].

Remark also that, so far, only regular sample covariancematrix models have been analyzed to address questions ofarray processing. In reality, due (in particular) to the non-Gaussian noise structure in radar detection, more advancedestimators of the population covariance matrix are used basedon the celebrated Huber and Maronna robust M-estimators

[62], [63]. We are currently investigating these M-estimatorswithin the random matrix framework. Further intricate modelsinvolving multi-path propagation and the technique consistingin stacking successive vector observations are also under study.

This completes the set of examples using results based onweak limits of large dimensional random matrices and G-estimation. We now turn to examples involving the resultsfor the spike models, starting with simple signal detectionprocedures.

C. Detection

Consider the hypothesis test which consists in decidingwhether a received signal y(t) ∈ CN consists of pure noise,y(t) = σw(t), with w(t) ∈ CN with Gaussian entries withzero mean and unit variance, for some unknown σ (hypothesisH0), or containing a signal plus noise, y(t) = hx(t)+σw(t),for a time-independent vector channel h ∈ CN , a scalar propercomplex Gaussian signal x(t) (hypothesis H1) and a noisevariance σ2, which are all unknown. As usual, we gather ni.i.d. received data in Y = [y(1), . . . ,y(n)].

Since h is unknown, we cannot compare directly H0 andH1. Instead, we will accept or reject H0 based on howY fits the pure noise hypothesis. It is a known result thatthe generalized likelihood ratio test (GLRT) for the decisionbetween H0 and H1 boils down to the test [64]

λ′1 ,λ1

( 1Nn tr YYH)

H0

≶H1

f(ε)

with λ1 the largest eigenvalue of 1nYYH, for f a given

monotonic function and ε the maximally acceptable falsealarm rate. Evaluating the statistical properties of λ′1 for finiteN is however rather involved and leads to impractical tests,see e.g. [13]. Instead, we consider here a very elementarystatistical test based on Theorem 5.

It is clear that 1Nn tr YYH a.s.−→ σ2 under both H0 or

H1. Therefore, an application of Slutsky’s lemma [1] ensuresthat the asymptotic fluctuations of λ′1 follow a Tracy-Widomdistribution around (1+

√N/n)2. The GLRT method therefore

leads asymptotically to test the Tracy-Widom statistics forthe appropriately centered and scaled version of λ′1. Moreprecisely, for a false alarm rate ε, this is

N23λ′1 − (1 +

√c)2

(1 +√c)

43√c

H0

≶H1

T−12 (1− ε)

with c = N/n and T2 the Tracy-Widom distribution. Furtherproperties of the above statistical test, and in particular the-oretical expression of false alarm rates and test powers arederived in [65] using the theory of large deviations. Due tothe relatively slow convergence speed of the largest eigenvaluedistribution to the Tracy-Widom law, some recent works haveproposed refinements, see e.g. [66].

In Figure 8, the performance of the GLRT detector given bythe receiver operating curve for different false alarm rate levelsis depicted, for small system sizes. We present a comparisonbetween the GLRT approach, the empirical technique [67]which consists in using the conditioning number of 1

nYYH

(ratio largest and the smallest eigenvalues) as a signal detector

Page 13: Signal Processing in Large Systems: A New Paradigm

13

1 · 10−3 5 · 10−3 1 · 10−2 2 · 10−20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

False alarm rate

Cor

rect

dete

ctio

nra

te

Neyman-PearsonGLRTcondition number

Fig. 8. Receiver operating curve for a priori unknown σ2 of the Neyman-Pearson test (N-P), condition number method and GLRT, N = 4, n = 8,SNR= 0 dB, h has Gaussian entries of zero mean and unit variance. For theNeyman-Pearson test, both uniform and Jeffreys prior, with exponent β = 1,are provided.

and the optimal Neyman-Pearson detector (with knowledgeof the channel statistics) derived in [68]. It turns out thatthe suboptimal asymptotic GLRT approach is quite close inperformance to the finite dimensional exact optimum, the latterbeing however quite complex to implement.

Our last application uses the second order statistics ofeigenvalues and eigenvectors for a spike model in the contextof failure localization in large dimensional networks.

D. Failure detection in large networks

The method presented here allows for fast detection andlocalization of local failures (few links or few nodes) in alarge sensor network of N sensors gathering data about Msystem parameters θ1, . . . , θM . Assume the linear scenario

x(t) = Hθ(t) + σw(t)

for θ(t) = [θ1(t), . . . , θM (t)]T ∈ CM and w(t) ∈ CN twocomplex Gaussian signals with independent entries of zeromean and unit variance. The objective here is to detect a smallrank perturbation of E[x(t)x(t)H] , T = HHH + σ2IN ,originating from a local failure or change in the parameters,assuming T known. In particular, when the entry k of θ(t)shows a sudden change in variance, the model changes as

x′(t) = H(IM + αkekeHk )θ(t) + σw(t)

for a given αk ≥ −1 and with ek ∈ CM the vector of all zerosbut for ek(k) = 1. Taking for instance αk = −1 turns the entryk of θ(t) into zero, corresponding to a complete collapse ofthe parameter under control. Denoting y(t) = T−

12 x(t) and

y′(t) = T−12 x′(t), we have E[y(t)y(t)H] = IN , while

E[y′(t)y′H(t)] = IN + [(1 + αk)2 − 1]T−12 Heke

HkHHT−

12

which is a rank-1 perturbation of IN .Now assume that we do not know whether the model

follows the expression of y(t) or y′(t), and let us generically

denote both by y(t). A simple off-line failure (or change)detection test consists, as in Section IV-C, to decide whetherthe largest eigenvalue of 1

nYYH, with Y = [y(1), . . . ,y(n)],has a Tracy-Widom distribution. However, we wish now togo further and to be able to decide, upon failure detection,which entry θk of θ was altered. For this, we need to providea statistical test for the hypothesis Hk =“parameter θk failed”.A mere maximum likelihood procedure consisting in testingthe Gaussian distribution of Y, assuming αk known for eachk, is however costly for N large and becomes impractical if αkis unknown. Instead, one can perform a statistical test on theextreme eigenvalues and eigenvector projections of 1

nYYH,which mainly costs the computational time of eigenvaluedecomposition (already performed in the detection test). Forthis, we need to assume that the number n of observations issufficiently large for a failure of any parameter of θ to be de-tectable. As an immediate application of Theorem 5, denotingλ the largest eigenvalue of 1

nYYH and u its correspondingeigenvector, the estimator k for k is then given by [44]

k = arg max1≤i≤M

−N(|uHi u|2 − ξiλ− ρi

)T

Σ−1i

(|uHi u|2 − ξiλ− ρi

)− log det Σi

with ui such that ‖ui‖ = 1, ωiuiuHi = [(1 + αi)

2 −1]T−

12 Heie

Hi HHT−

12 , and ξi, ρi, Σi defined as a function

of ωi in Theorem 5 (the index i refers here to a change inparameter θi and not to the i-th largest eigenvalue of the smallrank perturbation matrix).

This procedure still assumes that αi is known for each i. Ifnot, an estimator of αi can be derived and Theorem 5 adaptedaccordingly to account for the fluctuations of the estimator.

In Figure 9, we depict the detection and localization per-formance for a sudden drop to zero of the parameter θ1(t)in a scenario with M = N = 10. We see in Figure 9that, compared to the detection performance, the localizationperformance is rather poor for small n but reaches the samelevel as the detection performance for larger n, which isexplained by the inaccuracy of the Gaussian approximationfor ωi close to

√c.

V. CONCLUSIONS

In this short tutorial about statistical inference using largedimensional random matrix theory, we have argued that manytraditional signal processing methods are inconsistent whenboth the population and system dimensions are large. We thenintroduced notions of random matrix theory which provideresults on the spectrum of large random matrices. These resultswere then used to adjust some of these inconsistent signalprocessing methods to new consistent estimates. To this end,we presented a recent method based (i) on the Stieltjes trans-form to derive weak convergence properties of the spectrumof large matrices and (ii) on complex integration to deriveestimators. This somewhat parallels the Fourier transform andM-estimator framework usually met in classical asymptoticsignal processing [1].

Nonetheless, while classical signal processing tools arevery mature, statistical inference based on random matrix

Page 14: Signal Processing in Large Systems: A New Paradigm

14

10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n

Cor

rect

dete

ctio

n/lo

caliz

atio

nra

tes

CDR, FAR= 10−4

CLR, FAR= 10−4

CDR, FAR= 10−3

CLR, FAR= 10−3

CDR, FAR= 10−2

CLR, FAR= 10−2

Fig. 9. Correct detection (CDR) and localization (CLR) rates for differentlevels of false alarm rates (FAR) and different values of n, for drop of varianceof θ1. The minimal theoretical n for observability is n = 8.

analysis still lacks many fundamental mathematical results thatonly experts can provide to this day for advanced systemmodels. The main consequence is the slow appearance ofmethods to derive various estimators for more involved signalprocessing problems. The emergence of the aforementionedStieltjes transform and complex integration framework is arare exception to this rule which, we believe, is the foundationfor many future breakthroughs in this area.

REFERENCES

[1] A. W. Van der Vaart, Asymptotic Statistics. New York: CambridgeUniversity Press, 2000.

[2] X. Mestre and M. Lagunas, “Modified subspace algorithms for DoAestimation with large arrays,” IEEE Transactions on Signal Processing,vol. 56, no. 2, pp. 598–614, Feb. 2008.

[3] L. Laloux, P. Cizeau, M. Potters, and J. P. Bouchaud, “Random matrixtheory and financial correlations,” International Journal of Theoreticaland Applied Finance, vol. 3, no. 3, pp. 391–397, Jul. 2000.

[4] N. Hansen and A. Ostermeier, “Adapting arbitrary normal mutationdistributions in evolution strategies: the covariance matrix adaptation,”in Evolutionary Computation, Proceedings of IEEE International Con-ference on. IEEE, 1996, pp. 312–317.

[5] E. Wigner, “Characteristic vectors of bordered matrices with infinitedimensions,” The Annals of Mathematics, vol. 62, no. 3, pp. 548–564,Nov. 1955.

[6] V. A. Marcenko and L. A. Pastur, “Distributions of eigenvalues for somesets of random matrices,” Math USSR-Sbornik, vol. 1, no. 4, pp. 457–483, Apr. 1967.

[7] W. Rudin, Real and Complex Analysis, 3rd ed. McGraw-Hill Series inHigher Mathematics, May 1986.

[8] V. L. Girko, Theory of Random Determinants. Dordrecht, TheNetherlands: Kluwer, Kluwer Academic Publishers, 1990.

[9] P. J. Bickel and E. Levina, “Regularized estimation of large covariancematrices,” The Annals of Statistics, vol. 36, no. 1, pp. 199–227, 2008.

[10] W. B. Wu and M. Pourahmadi, “Banding sample autocovariance matri-ces of stationary processes,” Statistica Sinica, vol. 19, no. 4, pp. 1755–1768, 2009.

[11] J. Wishart, “The generalized product moment distribution in samplesfrom a normal multivariate population,” Biometrika, vol. 20, no. 1-2,pp. 32–52, Dec. 1928.

[12] A. T. James, “Distributions of matrix variates and latent roots derivedfrom normal samples,” The Annals of Mathematical Statistics, vol. 35,no. 2, pp. 475–501, 1964.

[13] T. Ratnarajah, R. Vaillancourt, and M. Alvo, “Eigenvalues and condi-tion numbers of complex random matrices,” SIAM Journal on MatrixAnalysis and Applications, vol. 26, no. 2, pp. 441–456, 2005.

[14] F. Hiai and D. Petz, The Semicircle Law, Free Random Variables andEntropy - Mathematical Surveys and Monographs No. 77. Providence,RI, USA: American Mathematical Society, 2006.

[15] P. Biane, “Free probability for probabilists,” Quantum Probability Com-munications, vol. 11, pp. 55–71, 2003.

[16] A. Masucci, Ø. Ryan, S. Yang, and M. Debbah, “Finite dimensionalstatistical inference,” IEEE Transactions on Information Theory, vol. 57,no. 4, pp. 2457–2473, 2011.

[17] N. R. Rao and A. Edelman, “The polynomial method for randommatrices,” Foundations of Computational Mathematics, vol. 8, no. 6,pp. 649–702, Dec. 2008.

[18] L. A. Pastur and V. Vasilchuk, “On the law of addition of randommatrices,” Communications in Mathematical Physics, vol. 214, no. 2,pp. 249–286, 2000.

[19] W. Hachem, O. Khorunzhy, P. Loubaton, J. Najim, and L. A. Pastur, “Anew approach for capacity analysis of large dimensional multi-antennachannels,” IEEE Transactions on Information Theory, vol. 54, no. 9, pp.3987–4004, 2008.

[20] J. W. Silverstein and Z. D. Bai, “On the empirical distribution ofeigenvalues of a class of large dimensional random matrices,” Journalof Multivariate Analysis, vol. 54, no. 2, pp. 175–192, 1995.

[21] X. Mestre, “Improved estimation of eigenvalues of covariance matricesand their associated subspaces using their sample estimates,” IEEETransactions on Information Theory, vol. 54, no. 11, pp. 5113–5129,Nov. 2008.

[22] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge UniversityPress, 1985.

[23] Z. D. Bai and J. W. Silverstein, Spectral analysis of large dimensionalrandom matrices, 2nd ed. New York, NY, USA: Springer Series inStatistics, 2009.

[24] G. W. Anderson, A. Guionnet, and O. Zeitouni, An introduction torandom matrices. Cambridge University Press, 2010.

[25] R. Couillet and M. Debbah, Random Matrix Methods for WirelessCommunications, 1st ed. New York, NY, USA: Cambridge UniversityPress, 2011.

[26] R. D. Yates, “A framework for uplink power control in cellular radiosystems,” IEEE Journal on Selected Areas in Communications, vol. 13,no. 7, pp. 1341–1347, 1995.

[27] F. Dupuy and P. Loubaton, “Mutual information of frequency selectiveMIMO systems: an asymptotic approach,” 2009. [Online]. Available:http://www-syscom.univ-mlv.fr/∼fdupuy/publications.php

[28] R. Couillet, M. Debbah, and J. W. Silverstein, “A deterministic equiv-alent for the analysis of correlated MIMO multiple access channels,”IEEE Transactions on Information Theory, vol. 57, no. 6, pp. 3493–3514, Jun. 2011.

[29] W. Hachem, P. Loubaton, and J. Najim, “Deterministic equivalentsfor certain functionals of large random matrices,” Annals of AppliedProbability, vol. 17, no. 3, pp. 875–930, 2007.

[30] R. Couillet, J. Hoydis, and M. Debbah, “Deterministic equivalentsfor the analysis of unitary precoded systems,” IEEE Transactionson Information Theory, 2011, submitted for publication. [Online].Available: http://arxiv.org/abs/1011.3717

[31] J. Hoydis, R. Couillet, and M. Debbah, “Random beamforming overcorrelated fading channels,” IEEE Transactions on Information Theory,2011, submitted for publication.

[32] F. Dupuy and P. Loubaton, “On the capacity achieving covariancematrix for frequency selective MIMO channels using the asymptoticapproach,” IEEE Transactions on Information Theory, 2010, to appear.[Online]. Available: http://arxiv.org/abs/1001.3102

[33] D. N. C. Tse and S. V. Hanly, “Linear multiuser receivers: effectiveinterference, effective bandwidth and user capacity,” IEEE Transactionson Information Theory, vol. 45, no. 2, pp. 641–657, Feb. 1999.

[34] S. Verdu and S. Shamai, “Spectral efficiency of CDMA with randomspreading,” IEEE Transactions on Information Theory, vol. 45, no. 2,pp. 622–640, Feb. 1999.

[35] V. L. Girko, “Ten years of general statistical analysis.” [On-line]. Available: www.general-statistical-analysis.girko.freewebspace.com/chapter14.pdf

[36] J. W. Silverstein, Z. D. Bai, and Y. Q. Yin, “A note on the largesteigenvalue of a large dimensional sample covariance matrix,” Journalof Multivariate Analysis, vol. 26, no. 2, pp. 166–168, 1988.

[37] Y. Q. Yin, Z. D. Bai, and P. R. Krishnaiah, “On the limit of thelargest eigenvalue of the large dimensional sample covariance matrix,”Probability Theory and Related Fields, vol. 78, no. 4, pp. 509–521,1988.

Page 15: Signal Processing in Large Systems: A New Paradigm

15

[38] Z. D. Bai and J. W. Silverstein, “No eigenvalues outside the support ofthe limiting spectral distribution of large dimensional sample covariancematrices,” The Annals of Probability, vol. 26, no. 1, pp. 316–345, Jan.1998.

[39] ——, “Exact separation of eigenvalues of large dimensional samplecovariance matrices,” The Annals of Probability, vol. 27, no. 3, pp.1536–1555, 1999.

[40] B. Dozier and J. W. Silverstein, “On the empirical distribution ofeigenvalues of large dimensional information plus noise-type matrices,”Journal of Multivariate Analysis, vol. 98, no. 4, pp. 678–694, 2007.

[41] F. Benaych-Georges and R. Rao, “The eigenvalues and eigenvectors offinite, low rank perturbations of large random matrices,” Advances inMathematics, vol. 227, no. 1, pp. 494–521, 2011.

[42] Z. D. Bai and J. F. Yao, “Limit theorems for sample eigenvaluesin a generalized spiked population model,” 2008. [Online]. Available:http://arxiv.org/abs/0806.1141

[43] J. Baik and J. W. Silverstein, “Eigenvalues of large sample covariancematrices of spiked population models,” Journal of Multivariate Analysis,vol. 97, no. 6, pp. 1382–1408, 2006.

[44] R. Couillet and W. Hachem, “Local failure detection and diagnosis inlarge sensor networks,” IEEE Transactions on Information Theory, 2011,submitted for publication.

[45] Z. D. Bai and J. W. Silverstein, “CLT of linear spectral statistics of largedimensional sample covariance matrices,” The Annals of Probability,vol. 32, no. 1A, pp. 553–605, 2004.

[46] J. Yao, R. Couillet, J. Najim, and M. Debbah, “Fluctuations of anImproved Population Eigenvalue Estimator in Sample Covariance MatrixModels,” IEEE Transactions on Information Theory, 2011, submitted forpublication.

[47] I. M. Johnstone, “On the distribution of the largest eigenvalue inprincipal components analysis,” Annals of Statistics, vol. 99, no. 2, pp.295–327, 2001.

[48] J. Baik, G. Ben Arous, and S. Peche, “Phase transition of the largesteigenvalue for non-null complex sample covariance matrices,” TheAnnals of Probability, vol. 33, no. 5, pp. 1643–1697, 2005.

[49] C. A. Tracy and H. Widom, “On orthogonal and symplectic matrixensembles,” Communications in Mathematical Physics, vol. 177, no. 3,pp. 727–754, 1996.

[50] J. Faraut, “Random matrices and orthogonal polynomials,” LectureNotes, CIMPA School of Merida, 2006. [Online]. Available: www.math.jussieu.fr/∼faraut/Merida.Notes.pdf

[51] O. N. Feldheim and S. Sodin, “A universality result for the smallesteigenvalues of certain sample covariance matrices,” Geometric AndFunctional Analysis, vol. 20, no. 1, pp. 88–123, 2010.

[52] T. Chen, D. Rajan, and E. Serpedin, Mathematical foundations for signalprocessing, communications and networking. Cambridge UniversityPress, 2011.

[53] Y. V. Fyodorov, “Introduction to the random matrix theory: Gaussianunitary ensemble and beyond,” Recent Perspectives in Random MatrixTheory and Number Theory, vol. 322, pp. 31–78, 2005.

[54] J. Mitola III and G. Q. Maguire Jr, “Cognitive radio: making softwareradios more personal,” IEEE Personal Communication Magazine, vol. 6,no. 4, pp. 13–18, 1999.

[55] R. Schmidt, “Multiple emitter location and signal parameter estimation,”IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp.276–280, 1986.

[56] M. L. McCloud and L. L. Scharf, “A new subspace identificationalgorithm for high-resolution DOA estimation,” IEEE Transactions onAntennas and Propagation, vol. 50, no. 10, pp. 1382–1390, 2002.

[57] X. Mestre, “On the asymptotic behavior of the sample estimates ofeigenvalues and eigenvectors of covariance matrices,” IEEE Transactionson Signal Processing, vol. 56, no. 11, pp. 5353–5368, Nov. 2008.

[58] P. Vallet, P. Loubaton, and X. Mestre, “Improved subspace estimation formultivariate observations of high dimension: the deterministic signalscase,” IEEE Transactions on Information Theory, 2010, submitted forpublication. [Online]. Available: http://arxiv.org/abs/1002.3234

[59] H. Akaike, “A new look at the statistical model identification,” IEEETransactions Autom. Control, vol. 19, no. 6, pp. 716–723, 1974.

[60] J. Rissanen, “A universal prior for integers and estimation by minimumdescription length,” The Annals of Statistics, vol. 11, no. 2, pp. 416–431,1983.

[61] B. Nadler, “Nonparametric detection of signals by information theoreticcriteria: performance analysis and an improved estimator,” IEEE Trans-actions on Signal Processing, vol. 58, no. 5, pp. 2746–2756, 2010.

[62] R. A. Maronna, “Robust M-estimators of multivariate location andscatter,” The annals of statistics, pp. 51–67, 1976.

[63] J. T. Kent and D. E. Tyler, “Redescending M-estimates of multivariatelocation and scatter,” The Annals of Statistics, pp. 2102–2119, 1991.

[64] M. Wax and T. Kailath, “Detection of signals by information theoreticcriteria,” IEEE Transactions on Signal, Speech and Signal Processing,vol. 33, no. 2, pp. 387–392, 1985.

[65] P. Bianchi, J. Najim, M. Maida, and M. Debbah, “Performance ofsome eigen-based hypothesis tests for collaborative sensing,” IEEETransactions on Information Theory, vol. 57, no. 4, pp. 2400–2419,2011.

[66] B. Nadler, “On the distribution of the ratio of the largest eigenvalueto the trace of a wishart matrix,” Journal of Multivariate Analysis, vol.102, no. 2, pp. 363–371, 2011.

[67] L. S. Cardoso, M. Debbah, P. Bianchi, and J. Najim, “Cooperativespectrum sensing using random matrix theory,” in IEEE PervasiveComputing (ISWPC’08), Santorini, Greece, May 2008, pp. 334–338.

[68] R. Couillet and M. Debbah, “A Bayesian framework for collaborativemulti-source signal detection,” IEEE Transactions on Signal Processing,vol. 58, no. 10, pp. 5186–5195, Oct. 2010.

Romain Couillet received his MSc in Mobile Com-munications at the Eurecom Institute and his MScin Communication Systems in Telecom ParisTech,France in 2007. From 2007 to 2010, he workedwith ST-Ericsson as an Algorithm DevelopmentEngineer on the Long Term Evolution Advancedproject, where he prepared his PhD with Supelec,France, which he graduated in November 2010. He iscurrently an assistant professor in the Telecommuni-cation department of Supelec. His research topics arein information theory, signal processing, and random

matrix theory. He is the recipient of the Valuetools 2008 best student paperaward and of the 2011 EEA/GdR ISIS/GRETSI best PhD thesis award.

Merouane Debbah was born in Madrid, Spain. Heentered the Ecole Normale Superieure de Cachan(France) in 1996 where he received his M.Sc andPh.D. degrees respectively in 1999 and 2002. From1999 to 2002, he worked for Motorola Labs on Wire-less Local Area Networks and prospective fourthgeneration systems. From 2002 until 2003, he wasappointed Senior Researcher at the Vienna Re-search Center for Telecommunications (FTW) (Vi-enna, Austria). From 2003 until 2007, he joined theMobile Communications de-partment of the Institut

Eurecom (Sophia Antipolis, France) as an Assistant Professor. He is presentlya Professor at Supelec (Gif-sur-Yvette, France), holder of the Alcatel-LucentChair on Flexible Radio. His research interests are in information theory,signal processing and wireless communications. Mrouane Debbah is therecipient of the “Mario Boella” prize award in 2005, the 2007 GeneralSymposium IEEE GLOBECOM best paper award, the Wi-Opt 2009 bestpaper award, the2010 Newcom++ best paper award as well as the Valuetools2007, Valuetools 2008 and CrownCom2009 best student paper awards. He isa WWRF fellow.